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(U) EXECUTIVE SUMMARY : 

(S/NF/SG/LIMDIS) In compliance with the Congressional 
conferees ' request (Appendix A) , DIA proposes to develop a multi- 
year research and development program , subject to rigorous 
scientific and technical oversight, to demonstrate the scientific 
validity of the STAR GATE program, and that results of military 
and intelligence value can be obtained in a cost-effective manner 
using anomalous mental phenomena (AMP) . 

(S/NF/SG/LIMDIS) This proposed program, if successfully 
implemented, will: 

- Identify the underlying mechanisms of AMP. 

- Establish the limits of operational usefulness of 
AMP. 

— Determine the degree to which foreign activities in 
AMP represents a threat to national security. 

- Lead to the development of countermeasures to 
neutralize this threat. 

- Use research findings to improve operational 
activities. 

- Develop data fusion criteria to integrate AMP results 
with other intelligence sources. 

(S/NF/SG/LIMDIS) Due to the diversity of the STAR GATE 
miss ion/ objectives, both external resources and in-house 
expertise are required. Since this Activity possesses no in- 
house R&D capability, an absolute need for external R&D support 
is required to meet congressional concerns which are addressed in 
this program plan. A balance will be maintained between external 
and in-house activities, and every effort will be made to 
integrate and link these activities where appropriate. The 
external aspect permits a wide range of expertise covering many 
disciplines to be focused on this area; this also has the benefit 
of ensuring peer group review and of facilitating a variety of 
scientific interactions. In-house personnel with a wide-range of 
expertise in this phenemenology will need to be retained to make 
this proposed plan work. 

(S/NF) In order to review the major tenets of the draft 
program plan, the Defense Intelligence Agency will convene a 
panel of appropriate scientists to provide recommendations on the 
plan and the research it achieves. Based on the panel's 
recommendations, the Defense Intelligence Agency will then submit 
a budget line item to fund those approved objectives. 


SECRET 

NOT RELEASABLE TO FOREIGN NATIONALS 
STAR GATE 
LIMDIS 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


SECRET 

(C) An annual report will document the current 
operational, technical and administrative status of the program. 

I. (U) INTRODUCTION : 

(S/NF/SG/LIMDIS) This program plan was developed in 
response to a Defense Authorization Conference, Congressionally 
Directed Action (CDA) to prepare a long-term systematic and 
comprehensive research and peer review plan in order to 
investigate anomalous mental phenomena (AMP) , and to apply 
program research results to potential operational activities • 

This plan also describes key in-house activities along with an 
appropriately integrated basic and applied external research 
support effort. 

(S/NF/SG/LIMDIS) Specifically, this program plan 
represents DIA's view on how best to proceed with both in-house 
activities and external research support for the period of FY95 
through FY99. Research findings, both domestic and foreign, and 
results from operational activities may lead to updates of this 
plan in order to reflect improved phenomena understanding and to 
pursue follow— on research and/or application directions. 

(S/NF/SG/LIMDIS) A underlying and fundamental premise 
governing the implementation of this program plan is that a well 
integrated interdisciplinary approach is considered to be the 
most appropriate strategy for conducting research in this diverse 
field. Consequently, this plan includes a wide variety of 
research topics which are based on recent findings from leading- 
edge pursuits in other disciplines that are suspected of being 
germane for STAR GATE. Other topics are derived from a review of 
worldwide research, consultations with leading area experts, and 
on insights gained from previous research and application 
activities associated with the STAR GATE program. 

(S/NF/SG/LIMDIS) This program plan also allows for the 
STAR GATE program to show results that are cost effective and 
will at the same time satisfy reasonable program performance 
criteria. The implementation of this program plan will preclude 
the reoccurrence of the yearly cyclical activity of project 
start-up, limited progress, followed by anticipated project shut- 
down which previously inhibited program activity. 

(S/NF/SG/LIMDIS) In sum, the implementation of this 
research and peer review plan will allow DIA to successfully 
accomplish identified R&D activities which, in-turn, will enhance 
the capability of STAR GATE personnel to engage in operational 
activities and to assess the work done by potential adversaries, 
thereby, reducing the risk potential for a technological 
surprise. 
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(U) Terminology and definitions are discussed at 
Appendix B. 

II. (U) PLAN OBJECTIVES : 

(S/NF/SG/LIMDIS) The objective of this follow-on research 
and peer review plan is to further develop phenomena 
understanding and/or validation, in applications understanding, 
and in operational feasibility evaluation. This continued work 
will have a direct bearing on DIA's ability to both assess the 
significance of foreign research and to perform a systematic 
review of potential applications regarding this phenomena. 

(S/NF/SG/LIMDIS) Accomplishment of the various activities 
identified in this plan will further enhance threat assessment of 
foreign achievements in this area, and will help achieve the 
potential for U.S. military/ intelligence applications on select 
tasks as a supplement to HUMINT operations. 

(U) It is anticipated that this plan will assist decision 
makers in their review and consideration of future directions for 
this field, and that this plan can begin formal implementation 
starting in FY95. 

(S/NF/SG/LIMDIS) In compliance with the Congressional 
conferees' request, DIA recommends that a period of six to nine 
months be set aside at the beginning of this new program for the 
purpose of identifying the most promising and cost-effective 
experiments to be conducted under the program to meet the overall 
research objectives outlined below. It is further suggested that 
a series of small working groups consisting of scientific experts 
from a variety of pertinent disciplines meet during this time 
period to accomplish this end. 

III. (U) SIGNIFICANCE OF EFFORT : 

(S/NF/SG/LIMDIS) STAR GATE is a dynamic approach for 
pursuing the largely unexplored area of human consciousness and 
subconsciousness interaction. Its scope is comprehensive; a wide 
range of phenomenological issues are examined that include 
psychological, physiological/neurophysiological, physics and 
other leading-edge scientific areas. Although broad in scope, 
STAR GATE is well grounded due to its solid independent 
scientific review base. STAR GATE is based on a dynamic style in 
all its endeavors, especially in its pursuit of on-going foreign 
activities in this area. 

(S/NF/SG/LIMDIS) One of the tasks previously levied on DIA 
by the FY91 Defense Authorization Act was to develop a long-range 
comprehensive plan for investigating parapsychological phenomena. 
This task was one of several objectives included in a new program 
for this phenomenological area that identified DIA as executive 
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agent. Moreover the FY91 Defense Authorization Act authorized 
for DIA a funding level of $2 million for DIA in order to 
initiate this new program. As a result, a balanced and 
integrated plan to include operations, foreign assessment, and 
research and development was implemented . In addition, a new 
DIA limited dissemination (LIMDIS) program, codeword STAR GATE, 
was established in order to accomplish the objectives that were 
set forth in this plan. 


SG1B 


(S/NF/SG/LIMDIS) The external research support conducted 
under monies appropriated to date comes to a close in the June 
1994 time- frame. The impact of this is that if research 
activities utilizing human sub] ects are interrupted, it has 
generally been necessary to begin again instead of later resuming 
activities from the point of termination. Consequently, it is 
important for the STAR GATE program to remain stable. Research 
involving human use differs considerably from that involving 
physical systems. For example, data from human subjects cannot 
be collected nor analyzed as rapidly, in that additional 
empirical data is often required to reach analytical conclusions. 
This type of data analysis utilizing human subjects can only be 
achieved with an in - place, uninterrupted, roulti— year research and 
development program. Therefore, should it be decided to go 
forward with this program, it should be done in a timely fashion. 

(S/NF) The funding allocation for external research 
received by STAR GATE in FY91 and continued through FY 1993 
permitted several important research areas to be initiated and 
continued. It is anticipated that results of this research will 
assist in clarifying some of the possible future research 
directions; consequently, not all long-range research 
possibilities can identified in this plan. However, most all o 
the major investigation areas can be addressed, and many of the 
specifics can be identified with reasonable confidence. 

Figure 1 presents an overview of overall research objectives for 
both Anomalous Cognition (AC) and Anomalous Perturbation (AP) 
which will be considered for inclusion in this program. 
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(S/NF) Previous basic research activities from FY91 
through FY93 focused on the following; (1) validating findings 
from previous magnetoencephalograph (MEG) research and initiating 
new work with a variety of conditions and individuals; (2) 
performing a variety of anomalous cognition (AC) experiments to 
determine potential correlations (e.g., target type, 
environmental factors) ; (3) developing various theoretical 
constructs that might be testable and that could help explain the 
> phenomena; (4) examining effects of altered states on data 

quality; (5) initiating review of and research into the 
energetics area; and (6) examining various application 
possibilities (e.g., communication, search). 

(U) Results from previous basic and applied research 
activity have been factored into this research and development 
plan and provide the basis upon which further R&D efforts will be 
built. 


IV. (U) PLAN OVERVIEW : 

/ A. (U) BASIC RESEARCH OBJECTIVES 

(S/NF/SG/LIMDIS) The objective of basic research is to 
understand the fundamental, underlying mechanisms for AMP. To 
achieve this objective in an efficient way, basic research of the 
detection mechanism should begin in a conservative direction. 

That is, assume that a putative " sensorial” system exists for AMP 
and that it most likely will behave similarly to those common 
elements which are known through the five senses. This 
conservative approach generalizes to understand the source of AMP 
and its propagation mechanisms (Figure 1) . 

v B. (U) APPLIED RESEARCH OBJECTIVES 

(S/NF/SG/LIMDIS) The objective of applied research is 
to improve AMP functioning to its maximum possible limit. To 
realize this objective, it is critical to define AMP output 
measures that are consistent with either a laboratory setting 
and/or an operational environment. The approach should also 
reflect scientific conservatism. In investigating any single 
variable (e.g., different training methodologies) all other 
variables should remain as constant as possible (e.g., use the 
same individuals and known good target systems) . 

^ C. (U) FOREIGN ASSESSMENT SUPPORT OBJECTIVES 

(S/NF) From a research perspective, the objective of 
foreign assessment is to determine the degree to which claims 
from foreign laboratories can be confirmed in a U.S. -based 
setting. In science, replication is cfitical for understanding. 
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V. (U) BASIC RESEARCH PLAN FOR ANOMALOUS COGNITION : 

A. (U) BASIC APPROACH 

(S/NF) The link of basic and applied research with 
other applications investigations or with research activities is 
shown on Figure 2. The top of the chart shows that for any 
research or application task, certain conditions must be met 
(e.g., a reliable calibrated individual is required; proper 

> scientific procedures need to be developed, etc.)* Once these 
basic foundations are laid, then basic/applied research can be 
initiated with a reasonable expectation of success and with 
assurance that results will not be ambiguous or fail scientific 
scrutiny. 

(S/NF) This chart also illustrates the difference 
between basic and applied research; applied research relates to 
various methods for collecting, recording, improving and 
analyzing data output, while basic research is aimed at phenomena 
understanding. In this chart, the "detector" is the human 
brain/mind, the "source" is the target or an aspect of the 

> target, and "transmission" refers to notions of how information 
and/or energy are actually transmitted between source and 
detector . 

(U) Figure 3 illustrates the interdisciplinary scope 
that will be brought to bear on this research problem. Leading- 
edge researchers in their various fields can provide clues, if 
not make direct contributions, that will assist in phenomena and 
applications understanding. Appendix C lists candidate research 
support facilities that could be involved in this long-range 
effort. Appendix D outlines pertinent research literature 
applicable to this field. Final selection will be based on how 
J well the activities if these institutions will fit into specific 

time-lines and priorities to be established in FY95. Figure 4 
lists milestones for the anomalous cognition basic research to be 
conducted under this plan. 

B. (U) RESEARCH DETAILS 

1. (U) Source . 

(S/NF/SG/LIMDIS) Source research will address 
those topics that show promise for understanding the 
characteristics of the target or target area that may play a role 
J in anomalous cognition (AC) occurrence and data quality. Aspects 

of the target that can be defined by conventional information 
theory (involving entropy/ information content) will be explored 
in-depth. A wide variety of targets with a wide range of 
information content, dynamics, or other parameters will be 
examined to explore this possible link'. If not successful, other 
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approaches to investigate the targets' innate nature and its 
possible link to phenomenon occurrence will be initiated. 
Definitive data in this area would also have implications for 
defining those targets which have the highest probability 
successful data acquisition in an operational setting, th 
establishing operational tasking parameters. 

2. (U) Transmission . 

(S/NF) The pursuit of possible transmission^ 
mechanisms for AC phenomena is essentially the most significant 
basic research task and also the most difficult to formulate. In 
this effort, a theoretical basis will be developed from 
extensions of current theory in light of recent advanced physics 
formulations. Some of these formulations permit unusual 
"information flows" that may, in fact, have relevance for this 
phenomenon. Testable models /constructs will be developed and 
evaluated. A variety of other possible explanations involving 
extensions of gravitation theory, quantum physics or other areas 
will be constructed and tested where possible. Some of these 
tests may require close cooperation of leading-edge researchers 
using equipment in their facility. 

(C/NF) Effort in this area will also focus on 
integrating diverse aspects of the source, transmission, and 
detector categories. For example, it will examine how 
"targeting" occurs. Insight will be drawn from in-depth reviews 
of various unusual physical effects identified by physica 
sciences researches. These include distant particle coup g 
(Bell's theorem), ideas from quantum gravity, possible 
electrostatic/gravity interactions, unusual quantum physics, 
observational theories, vacuum "energy" potential, and a variety 
of other concepts. 

(S/NF) Perhaps the most promising exploratory 
model of all is one based on little-understood aspects of the 
fundamental equations for electromagnetic wave propagation 
(Maxwell's equations). These equations indicate that forms of 
"wave propagation" could also exist that do not have the 
conventional electric or magnetic field components (i.e., vector 
and scalar waves) . These waves would not be blocked by matter 
and therefore could be leading candidates for AC propagation or 
for certain aspects of AC phenomenon. 

waves are considered a leadi ng candidate"* or’ ^ transmissions by 


£ 


their researchers. Pilot study investigations in this area were 
conducted by PAG-TA in FY92 with promising preliminary results. 
Future research could couple with . other DIA exploratory R 
efforts in this area currently being explored. 
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(S/NF/SG/LIMDIS) Research on this topic will be 
closely integrated with research involving the anomalous 
phenomena (AP) aspect, since findings xn the AP area would ha 
direct implications for phenomena transmission mechanisms xn 
general. Findings from the target (or target source) researc 
area would also provide insight into Possible transmission 
mechanisms. For example, different forms of the same targe 
(e.g., target size, 2D vs 3D, holographic representations) may 
show patterns in the AC data that might provide clues regarding 
phenomena mechanisms. 

3. (U) Detector. 

(U) The most important and promising aspect of 
understanding the nature of the AC detection system in humans is 
through modern advances of the neuroscience. Earlier 
neurophysiological results obtained from magnetoencephalograph 
(MEG) measurements begun in FY92 will be validated **P* nded ‘ 

This earlier work indicated MEG correlations between visual 
evoked responses areas of the brain may exist, and that rem 
stimuli might also be detectable in MEG data. Some of the 
specific investigations will examine a variety of near and 
field situations, other sensory modes end different types of 
individuals in order to search for potential variables . . It might 
he possible, with advanced MEG instrumentation, to actually 

exact brain areas involved in AC phenomena occurrence. 
Future research in this area could couple with research currently 
being explored at the National Laboratory. 

(U) Other physical/psychophysical aspects of the 
central nervous system (CNS) will also be explored to look for 
possible correlates. This would include galvanic skin responses 
(GSR) or other parameters. 

(U) Related to this overall area are several 
investigations that relate to possible environmental interactions 
with the brain that could affect AC data. This would include 
possible geomagnetic or electromagnetic influences. 

(S/NF) A spin-off from findings in this basic 
research area could be for unique communication applications. 

MEG correlates might exist between remotely located people. If 
so the possibility of transmission of remote messages (via a 
type of code) might be possible. Since AC phenomenon is not 
degraded by distance or shielding, the potential of transmitting 
basic "messages" to individuals in submarines would exist. 
Preliminary exploration of this application by PAG-TA has yielded 
promising results. 

(S/NF) Another potential spin-off benefit 
detector research in this program is that new insights into brain 
memoryor parallel processing might be achieved. This could lead 
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to new directions in advanced compute r development s involving 

neural networks. For example, recent l_ that 

"wave-like" brain activity occurs in addition to usual neuronal 
processes. This wave-like phenomenon may have some link to the 
"phase shift" observed in MEG data from the previous MEG project. 
Further MEG work involving remote stimuli may help clarify such 
issues . 

4. (U) Integration . 

(U) The basic research activities will liberally 
avail itself of the existing research communities that specialize 
in neuroscience, physics and statistics and the broader ( 
psychological/social sciences. Direct support with a variety of 
university departments, national and international, will be 
explored. PAG-TA contacts with such national laboratories as Los 
Alamos, Lawrence Livermore, Oak Ridge, and have indicated an 
interest on their part in supporting the research efforts. 

Freguent conferences and data exchanges are anticipated. These 
data exchanges will insure that a proper interdisciplinary 
approach is maintained, and that findings from other disciplines 
will be incorporated in this program where appropriate. This 
peer group dialogue will greatly benefit research sponsored 
through this plan, new ideas will be generated, and possibly 
clues regarding phenomena operation will be easier to identify. 

(U) Some specific interdisciplinary examples that 
will benefit this program are as follows: 

- In 1990 The American Anthropological 
Association (AAA) formed a new division, the Society for the 
Anthropology of Consciousness (SAC). This division has 
established a technical journal to support interdisciplinary, 
cross-cultural, experimental, and theoretical approaches to the 
study of consciousness. This group may be able to contribute 
this program by providing cross-cultural examples. These members 
might also assist in the assessment of foreign data in this area. 

- The psychophysiology of vision has already 
contributed to the earlier program. This plan calls for a 
collaborative effort with researcher in an attempt to understand 
how the central nervous system process subliminal stimuli. This 
should assist in understanding how MEG correlates occur. 

- The relationship between mind and body is 
currently discussed in the research literature as well as in the 
popular press. Researcher at the California Institute for 
Transpersonal Psychology (CITP) have been active }™ e ?^ating 
the role of mental attitudes and body chemistry. While there may 
not be a direct link with AC, and exchange of techniques and 
experimental designs would be helpful. 
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- The Journal of Cognitive Neuroscience 
contains at least one article of interest in each issue. This 
discipline is where most of the cognitive work with the 
neuromagnetism is conducted. There is the possibility of joint 
investigations with researchers performing MEG investigations at 
the National Institutes of Health (NIH) . 


- Stanford University has been conducting 
research on internal mental imagery . The manipulation and . 
control of this imagery is extremely important in understanding 
the source of internal noise during an AC session. A 
collaborative effort with Stanford should lead to methods for 
noise reduction. 






- Neural networks are particularly good at 
recognizing subtle patterns in complex data, and are being 
applied in the subjective arena of decision making m business. 

In order to improve AC analysis, the program will conduct a 
collaborative effort with scientists who are active m neural 
network research and with selected individuals who have had 
success with interpreting highly subjective data. 

- Statistics is the heart of AC research in 
that most of the results are usually quoted in statistical terms. 
Hypothesis testing has traditionally been the primary focus, but 
there are other possible approaches that should be explored. 
Statistics researchers at Harvard have already expressed interest 
in contributing to the research effort. 


- A major portion of the effort will be a 
search for a AC evoked response in the brain. Sophisticated 
processing is required in that magnetic signals from the brain 
can not be easily characterized by standard statistical 
practices. Several research facilities can contribute. 


- classical statistical thermodynamics may be 
the heart of understanding the nature of an AC source of 
information. A physical property called entropy may be related 
to what is sensed by AC. The program intends to collaborate with 
a variety of university physics departments to calculate the 

appropriate parameters. _ ♦ 

(S/NF) The specific experiments to be conducted in 

these research domains will be defined during the first six to 
nine months of the program utilizing the recommendations of the 
working groups mentioned above subject to approval by the 
Scientific Oversight Committee. 


(U) RASTC RESEARCH PLAN FOR ANOMALOUS PERTURBAT ION ; 


(S/NF) Figure 5 illustrates the basic approach for 
investigations "energetics", or anomalous perturbation (AP) 
phenomenon. Intelligence reporting indicates that this aspect of 
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Figure 5 (U) Basic Research Milestones - Anomalous Perturbation 
(To Include Biological Systems) 
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*«t-> i [should receive 

attention in this research plan to prevenu Luuuiui^ical 
surprise. Thus, beginning in FY95 » . acceptance criteria will 
establish with which to judge the historical 1;L terature f 
potential AP effects. Using those criteria, a detailed review of 
the literature will begin in mid FY95 and considering the size of 
that data base will continue through FY95. Knowledge gained from 
this review may provide insights for the development of new AP 
target systems or provide data so that particular experiments can 
be replicated. Given the complexity of most AP experiments, 
considerable time is needed to plan and conduct them properly. 

If the results warrant, then application development may begin as 
early as FY96; however the primary task of basic research of 
is to attempt to validate its existence. Findings from foreign 
research will be examined and factored into this activity as 
appropriate. 

(S/NF) The keys to investigating this area will be in 
appropriate personnel selection and, very likely, in proper 
selection of the AP test device. Thus, the initial phase of this 
effort will involve identification and solicitation of 
individuals known or claimed to have such talents. For example, 
certain expert martial arts or yoga practitioners might do well 
in such experiments due to their strong mental conditioning and 
ability for intense mental focus. After locating such 
individuals, various instruments, such as microcomputer devices, 
sensitive electronic/ sensor devices, or other unique or sensitive 
equipment would be used as targets in AP experiments. 

(S/NF) Some of the unique sensor candidates include 
devices that are highly sensitive to very weak gravitational 
effects (such as Mossbauer devices or atomic clocks). Perhaps 
the most promising device is one that involves detection of an 
unusual non-electromagnetic wave (A vector/scalar wave). .*? t 
experiments with such sensors are successful, then significant 
understanding of AP or AC phenomenon would occur. Experiments 
with such a device is a distinct near-term possibility; 
consequently this will be given high priority in the ear y par 
of this long-range program. 

(S/NF) Should these pilot experiments prove successful, 
then a near and distant experiments would be developed for a wide 
variety of devices to evaluate application aspects. Potential 
applications could include, for example, remote switching (in a 
communication role) or possibly as a countermeasure to minimize 
effectiveness of threat systems such as sensitive computer 
components or sensors. Similarly, if these results are 
successful, they would provide insight regarding potential 
threats to U.S. systems or security. 
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(S/NF) The specific experiments to be conducted in these 
research domains will be defined during the first six to nine 
months of the program utilizing the recommendations of the 
working groups mentioned above. 

VII. (U) APPLIED RESEARCH PLAN FOR ANOMALOUS COGNITION : 

(U) Figure 6 illustrates the overall plan for the applied 
-> research portion for several main functional categories. 

a. (U) SELECTION 

(C) The most promising potential for selecting 
individuals is to identify ancillary activity that correlates 
with AC ability. If such a procedure can be identified, then 
receiver selection can be incorporated as part of other screening 
tests (e.g., fighter pilot candidacy), and thus large populations 
can be used. Among the items that will be examined are 
physiology (e.g., responses of the brain to external stimuli) and 
hypnotic susceptibility (i.e., an individuals predisposition for 
> being hypnotized) . The results of this effort will be examined 

continuously; however, a decision to end the investigation will 
occur in mid FY96. Should the results at that time warrant, then 
refining of the techniques will continue to the end of FY 1998. 
The reason the initial research spans several years is that to 
validate even one psychological finding requires long-term 
testing of candidate individuals. Current statistical methods 
require many AC sessions, and experience has shown that only a 
few sessions can be conducted per week for any single individual. 

(C) The previous program was able to estimate 
that approximately one percent of the general population 
< possessed a high-quality, natural AC ability. Because the 

empirical method (i.e., asking large groups to attempt AC) is 
labor intensive and very inefficient, it is included in the 
research plan only as an alternate approach. 

b. (U) TRAINING 

(S/NF) Training has been a major part of the 
previous program; however, results of training approaches have 
been difficult to evaluate and have not been examined 
systematically. Systematic review of this issue was begun in FY 
92. One of the methods that will be examined involves lowering 
- an individual's visual subliminal threshold (i.e., the level 

below which an individual is not consciously aware of visual 
material). This could enhance the individual's sensitivity to AC 
data. Other forms of altered states, such as dreaming and 
hypnosis, will also be evaluated to see if such states can 
enhance AC data quality. 
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Figure 6 (U) Applied Aesearch Milestones - Anomalous Cognition 
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(U) Results on these issues should be available 
at the close of FY95. If no progress has been observed and if 
there have been no positive results from the basic research, the 
task ends. However, should any of the variables examined appear 
promising then the task will be continued. 

(S/NF) It is anticipated that all laboratory 
successes must be validated by simulating operational tasks. 
j These experiments involve identifying the specialty to be tested, 

the acceptance criteria, and conducting sessions in which the 
complete target systems are know. This three-year activity runs 
concurrently with the other tasks but with a one-year offset to 
allow for planning. 

C. (U) TARGET/ APPLICATION SELECTION 

(C) Based on earlier research, the most promising 
approach to target selection appears to be a single physical 
characteristic called entropy (i.e., a measure of inherent target 
information). Beginning in FY95, two and one half years have 
> been allocated for the detailed study of this aspect of target 

properties. Initially, little experimentation is required; 
rather, a retrospective examination of previous target systems 
should indicate if this approach is valid. Included in this 
examination are detailed calculations of the information content 
of natural target scenes. 

(S/NF) Beginning in mid FY96, other potential 
intrinsic target properties will be examined. For example, a 
target may be more readily sensed by AC if the collection of 
elements at the site (e.g., landmark, buildings, roads) 
constitute a conceptually coherent unit as opposed to a collage 
y of unrelated items. Quantitative definition of targets will also 

be developed that include non-physical target parameters such as 
function, meaning, or relationships. These aspects are highly 
important in most operational projects and need to be quantified. 


(S/NF) Part of this effort will involve 
investigations that serve two purposes: (1) add insight into 

the phenomenon; and (2) help evaluate the feasibility of certain 
potential applications. For example, long distance experiments 
could be conducted to or from deep caves or submarines in deep 
water to test communication potential and transmission theories, 
y Experiments could also be conducted to targets on board space 

platforms to test distance and gravitational effects. 

Experiments to or from magnetically shielded rooms or certain 
earth locations (e.g., the magnetic pole) might indicate if 
magnetic fields influence the phenomenon. Experiments to 
opposite sides of the earth might also indicate if a mass or 
gravity effect can be noted. 


SECRET 

NOT RELEASABLE TO FOREIGN NATIONALS 
STAR GATE 
LIMDIS 
19 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


SECRET 

(S/NF/SG/LIMDIS) This area of investigation will 
be integrated with a variety of applications in coordination with 
findings/ investigations pursued by the in-house effort. Figure 9 
identifies the main application or operational areas. Along with 
types of data desired. This activity will be integrated, where 
possible, into in-house pursuits that will explore these areas in 
a systematic fashion. Initial emphasis will be in 
counternarcotics and counterterrorism areas. 

(S/NF/SG/LIMDIS) Specific types of applications 
that will be explored in-depth include the search problem. 

Search tasks are expected to remain as high priority operational 
tasks (e.g., hostage location, lost equipment or system 
location) . Search tasks are complicated by timing issues, 
especially if the missing target is being moved frequently. 
Related to this will be examination of predictive capability in 
order to evaluate feasibility of detecting hostile plans and 
intentions in advance. Pilot studies of other areas (e.g., code 
breaking, medical diagnostics, low intensity conflict support) 
will also be initiated. 

(S/NF/SG/LIMDIS) Another application area that 
will be examined is "communications". Previous research 
indicates that with proper protocols, basic or coded messages can 
be sent and received via AC procedures. Redundant coding methods 
can readily enhance probability of success, and new statistical 
methods can also improve success rates. Communication 
applications may have significant value for search problems by 
providing additional information on location of kidnapped or 
hostage victims. Such techniques might also help in determining 
hostage or POW state-of -health or other significant issues. 

d. (U) PROTOCOLS 

(U) Given the laboratory success of AC 
experimentation, the protocol task can build upon a substantial 
literature. Determining optimal, specialty-dependent protocols 
only require extending current concepts. Several years are 
required due to the statistical nature of analysis that is 
required to determine the effects of environment, receiver, 
target and feedback conditions. Several high-interest 
application areas (such as search/ location) will be examined in 
detail. A variety of session procedures will be evaluated to 
determine those that are beneficial to improving data quality. 

(S/NF) Protocol effectiveness may be measured by 
quality, quantity, and/or usefulness of the AC information 
elicited by its use. The requirements for protocols that are 
designed for laboratory settings are considerably more 
restrictive than those required for operational settings. For 
example, providing limited information to a receiver while an 
operational session is in progress (i.e., intermediate feedback) 
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might facilitate the acquisition of the desired data. This kind 
of feedback is strictly prohibited, however, in most protocols 
designed for laboratory experiments. Protocols may also vary 
depending on nature of the data required. For example, for some 
search projects, only general data may be adequate. For such 
cases would not require development of highly specific details 
and protocols the sessions would not be as complex. 

(U) A detailed protocol will need to consider a 
variety of potential session variables such as the individuals ' 
physical environment, mental state and attitude, and how the 
target or task is designated (e.g. , coordinates, abstract terms). 
Other data includes specifics of the session (monitor present or 
not), type of feedback, type of response data (e.g., predictive), 
and mode and method of response (e.g., drawings, verbal). 

(S/NF) Concurrently, the only known way to 
resolve the above issues is to conduct a large number of trials 
for a given individual with as many of the potential variables as 
possible held constant. Standard statistical methods can then be 
used to identify trends, patterns, and operational constraints. 

e. (U) DATA ANALYSIS 

(U) This area requires extensive review of 
leading analysis tools, such as those required for describing 
imprecise concepts or data (i.e., artificial intelligence 
techniques, fuzzy sets) . This work will be combined with 
findings from neural network analysis and research, or possibly 
combinations of other emerging advanced analysis methods. 

(S/NF) Various approaches that are anticipated to 
directly benefit operational evaluations. One promising 
technique involves procedures based on an adaptive (frequent data 
base update) approach. This will permit an individual’s 
progression, and possibly time dependent data variables in an 
individual's track record, to be identified. 

(S/NF) In addition to the search for new analysis 
methods, the current methods will also be reexamined. Laboratory 
requirements differ from those for operational activities in that 
the target can be controlled and well defined. For operational 
activities, uncertainties in tasking may arise, especially if 
operational requirements are changing or if some of the initial 
"known" data are incorrect. Such uncertainties complicate later 
analyses. 


(S/NF) Analysis methods will also be developed 
that can make predictions on data quality for any given task. 
This will require development of an extensive track record for 
each individual based on both controlled and operational 
projects. 
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(S/NF) These analysis methods will also address 
certain practical issues. For example, a detailed, high-quality 
example of AC data may have little value to an intelligence 
analyst if that information was known from other sources. 

Likewise, a poor example of AC data might provide a single 
element as a tip-off for other assets, or provide the missing 
piece in a complex analysis, and thus be quite valuable. The 
intelligence utility of AC data may in some cases be only weakly 
•* connected to the AC quality. Therefore a data fusion analysis 

procedure is needed for AC-derived operational data. Methods 
that permit appropriate data analysis from an accuracy and 
utility viewpoint will be developed. 

f. (U) INTEGRATION 

(U) This activity would be an on-going review/ 
integration effort in order to identify patterns or clues useful 
for understanding practical aspects of this phenomenological 
area. 

-> (S/NF) Identifying approaches and procedures that 

permit assimilation of AC data from operational support projects 
into all-source intelligence analysis procedures will also be 
part of this support activity. Depending on results of applied 
research findings and operational pursuits, a basic seminar/ 
training program for other applications-oriented elements might 
be established. Such a training/seminar program would focus on 
basic techniques and would augment possible operational training 
activity that might become part of the in-house effort. This 
would require several years to develop and establish. 

(S/NF) The specific experiments to be conducted 
-> in these research domains will be defined during the first six to 

nine months of the program utilizing the recommendations of the 5Q^g 
working groups mentioned above. 
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(S/NF/SG/LIMDIS) The in-house and external research 
pursuits identified in this overall research and peer review plan 
have the potential for achieving highly significant results using 
AMP to address problems of national security by pushing the 
phenomena to their natural limits. This overall result will be 
achieved by: 

- Determining the underlying physical 

* mechanisms of AMP. 

- Isolating specific brain processes 
involved in the phenomenon. 

- Identifying unique applications 
involving energetics” phenomenon (e.g., 
remote switching) . 
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(S/NF/SG/LIMDIS) It is the intention of STAR GATE to 
pursue all aspects of this area with high intensity, drawing on 
an experienced and well-qualified staff along with appropriate 
external assistance, in order to quantify and evaluate all 
available classified and unclassified research. By so doing, 
discoveries into how these phenomena work may be achievable. How 
to identify people with such talent (or potential for it) and how 
to develop/ train selected individuals should also be a natural 
end-result. STAR GATE also draw heavily from lessons learned in 
all previous research and application investigations on a 
worldwide basis. 

X. (U) PROJECT OVERSIGHT METHODOLOGY : 

A. (U) PROGRAM MANAGEMENT /OVERSIGHT 

(S/NF) DIA, as executive agent, has implemented a 
management structure that fosters a proactive, responsive, and 
creative environment for this activity. Both external research 
and in-house activities are centered in one unit (PAG-TA) under 
the direct supervision of the Director, Office for Ground Forces 
(DIA/ PAG) . 


(S/NF) Project oversight for this program will be 
provided by a Project Review Board (PRB) composed of five senior 
management individuals selected from areas of DIA outside of the 
National Military Intelligence Production Center (NMIPC) . In 
addition, a six-member Project Oversight Panel will be 
established to provide program and technical guidance on all STAR 
GATE activities. The 28 member DIA Advisory Board has been 
appraised of the STAR GATE program and their recommendations have 
been incorporated into project activities. Review/ guidance is 
available from DIA's Executive Director and from the Deputy 
Director. The General Defense Intelligence Program (GDIP) staff 
director conducts periodic project reviews and provides guidance. 
Links with the Intelligence Community help provide a broader 
management and program review base for this activity. 

(U) The extensive nature and scope of these various 
program management and oversight activities will insure that all 
activities identified in this long-range plan can be 
appropriately monitored and evaluated on an on-going basis. 

B. SCIENTIFIC OVERSIGHT 

(S/NF) Oversight for external contract activity is 
currently provided by a six-member expert Scientific Oversight 
Committee (SOC) . A Human Use Review Board has also been 
established to provide expert guidance/advice regarding 
contractor adherence to appropriate DOD human use regulation. 

(U) There is currently in place a contractor 
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Scientific Oversight Committee (SOC) which is tasked with three 
major responsibilities: 

a. Review and approve all experimental protocols 
prior to the collection of experimental data. 

b. Critically review all experimental final 
reports as if they were submissions to technical scientific 
journals. All remarks in writing are included in the final 
technical reports to DIA. 

c. Suggest directions for further research. 

(uy In addition to these responsibilities, the SOC 
members are encouraged to exercise un-announced drop-in 
privileges to view experiments in progress. 

(U) The five voting members of the SOC are respected 
scientists from the following disciplines: physics, astronomy, 

statistics, neuroscience, and psychology. See Appendix E for 
membership data. 

(U) A contractor Institutional Review Board (IRB) is 
currently in place with the responsibility of assuring compliance 
with all U.S. and DoD regulations with regard to the use of 
humans in experimentation and assuring their safety. The IRB 
members represent the health, legal, and spiritual professions in 
accordance with government guidelines. See Appendix F for 
membership data. 

(U) It is anticipated that oversight of this program 
will be conducted by these Committees, if available, or new 
committees with equivalent scientific credentials. 

XI. (U) DEVELOPMENT OF EVALUATION CRITERIA : 

A. (U) SCIENTIFIC VALIDITY 

(S/NF) The STAR GATE Scientific Advisory Committee has 
determined that the scientific validity of the STAR GATE program 
has been satisfactorily demonstrated under the most demanding of 
experimental protocols. An statistically significant anomaly 
does exist which cannot be currently explained by conventional 
means. For example, 77% of academics in the arts, humanities, 
and education believe that AMP is either an established fact or a 
likely possibility. Supporting technical evidence contained in 
technical studies may be found at Appendix G. 

(S/NF) A substantial number of examples dating back to 
1972 provide at a minimum priraa facia evidence that AMP can be 
used in such a way as to provide a '’value-added” function to the 
Intelligence Community. Appendix H is a formal evaluation of the 
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use of AMP for intelligence gathering purposes conducted in 1987. 
The overall findings of this evaluation were that " ...the Project 
Review Group has determined to its satisfaction that the work of 
the Enhanced Human Performance Group is scientifically 
sound... and is providing valuable insight into the nature of an 
anomaly which have a significant impact on the DoD." 

B. (U) PERFORMANCE 

(S/NF) The ability of the STAR GATE program to produce 
results that have an intelligence value can only be measured by 
customer evaluations. AMP provided intelligence data, along with 
other forms of intelligence, are evaluated, in part, with 
subjective criteria. STAR GATE will develop feedback mechanisms 
and procedures for customers that will result in a method of 
quantifying this subjective feedback and evaluation data so that 
the value added and cost-effectiveness can be measured. 

XII. (U) BUDGET AND RESOURCE REQUIREMENTS 
(FYs 95-99) : 

(S/NF/SG/LIMDIS) Due to the diversity of the STAR GATE 
mission/objectives, both external resources and in-house 
expertise are required. Since this Activity possesses no in- 
house R&D capability, an absolute need for external R&D support 
is required to meet Congressional concerns which are addressed in 
this program plan. A balance will be maintained between external 
and in-house activities, and every effort will be made to 
integrate and link these activities where appropriate. The 
external aspect permits a wide range of expertise covering many 
disciplines to be focused on this area; this also has the benefit 
of ensuring peer group review and of facilitating a variety of 
scientific interactions. In-house personnel with a wide-range of 
expertise in this phenemenology will need to be retained to make 
this proposed plan work. 

(S/NF) In order to review the major tenets of the draft 
program plan, the Defense Intelligence Agency will convene a 
panel of appropriate scientists to provide recommendations on the 
plan and the research it achieves. Based on the panel's 
recommendations, the Defense Intelligence Agency will then submit 
a budget line item to fund those approved objectives. 

(C) An annual report will document the current 
operational, technical and administrative status of the program. 
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APPENDIX A 

CONGRESSIONALLY-DIRECTED ACTION 
DEFENSE AUTHORIZATION CONFERENCE 


(S/NF) REQUEST : "The conferees are concerned that insufficient 

funds have been spent on research and development to establish 
the scientific basis for the STAR GATE program. The conferees 
direct the Director of DIA to prepare a program plan and to 
submit an appropriate budget request for a research effort, over 
several years, to determine whether the STAR GATE program can 
show results that are cost-effective and satisfy reasonable 
performance criteria. This plan, and any research under this 
program, should be subject to peer review by neutral scientific 
experts. The Director of DIA is directed to prepare this 
research and peer review plan within existing program funds." 
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TERMINOLOGY AND DEFINITIONS 


(U) PHENOMENA TERMINOLOGY : 

(U) This phenomenological area has had a variety of 
descriptive terms over the years, such as paranormal, 
parapsychological , or as psychical research. Foreign researchers 
use other terms: "psychoenergetics M in the USSR; "extraordinary 

human function" in the People's Republic of China (PRC). In 
general, this field is concerned with a largely unexplored area 
of human consciousness/ subconsciousness interactions associated 
with unusual or underdeveloped human capabilities. 

(U) Recently, researchers have shown a preference for terms 
that are neutral and that emphasizes the anomalous or enigmatic 
nature of this phenomena. The term anomalous mental phenomena 
(AMP) , is generally preferred. 

(U) This area has two aspects; information access and 
energetics influence. Information access refers to a mental 
ability to describe remote areas or to access concealed data that 
are otherwise shielded from all known sensory channels. A recent 
term for this ability is anomalous cognition (AC) . This term 
places emphasis on potential understanding that might be 
available from advances in sensory/brain functioning research or 
other related research. Older terms for this aspect have 
included extra-sensory perception (ESP) , remote viewing (RV) , and 
in some cases, precognition. 

(U) The energetics aspect refers to the ability to 
influence, via mental volition, physical or biological systems by 
an as yet unknown physical mechanism. An example of physical 
system influence would include affecting the output of sensors or 
electronic devices; biological systems influence would include 
affecting physiological parameters of an individual. A recent 
descriptive term for this ability is anomalous perturbation (AP) . 
Older terms for this phenomenon included psychokinesis (PK) or 
telekinesis. 

(U) GENERAL DEFINITIONS : 

(S/NF) For this program, basic research is defined to mean 
any investigation or experiment for determining fundamental 
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processes or for uncovering underlying parameters that are 
involved in this phenomenon. Basic research is primarily 
oriented toward understanding the physical, physiological , and 
psychological mechanisms of anomalous mental phenomena (AMP) . 

(S/NF) Applied research refers to any investigation 
directed toward developing particular applications or for 
improving data quality and reliability. For anomalous cognition 
(AC) phenomenon, research is primarily directed toward improving 
the output quality of AC data. This would include ways to 
develop/ improve utility of AC data for variety of potential 
application. For example, examination of spatial and temporal 
relationships of AC data could assist in developing a reliable 
search capability useful for locating missing people or 
equipment. 
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POTENTIAL RESEARCH SUPPO RT FACILITIES 


ANOMALOUS MENTAL PHENOMENA 


Science Applications International Corp. 
Mind Science Foundation 

Princeton Engineering Anomalies Laboratory 
American Society for Psychical Research 
St. John's University 

Foundation for Research into the Nature 
of Man 

ARE/Atlantic University 
University of Virginia 

Psychophysical Research Laboratories 

Edinburgh University 

OTHER RELATED DISCIPLINES 


Los Altos , CA 
San Antonio, TX 
Princeton Univ, NJ 
New York, NY 
Long Island, NY 
Durham, NC 

Virginia Beach, VA 
Char lottesvi 1 le , 

VA 

Edinburgh, 

Scotland 

Edinburgh, 

Scotland 


Psychology 

Stanford University 
Cornell University 


Stanford, CA 
Ithaca , NY 


Anthropology 

University of California 
University of Arizona 


Berkeley, CA 
Tucson, AZ 


Psychophysiology 
SRI International 
Langly-Portor Neuropsychiatric 
Menninger Foundation 


Menlo Park, CA 

Institute San Francisco, CA 
Topeka , KS 


Psycho immuno logy 

California Institute for Transpersonal Menlo Park, CA 

Psychology 


Cognitive Neuroscience 

Los Alamos National Laboratory 
Sandia National Laboratory 
University of California 


Los Alamos, NM 
Albuquerque , NM 
San Diego, CA 
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Cognitive Psychology 

Psychology Department, Princeton Univ 
Psychology Department, City College of 
New York 

Artificial Intelligence 

Massachusetts Institute of Technology 
Stanford University 

Neural Networks 

Massachusetts Institute of Technology 
Science Applications International Corp 

Statistics/Signal Analysis 
University of California 
Harvard University 

Thermodynamics 

Rochester University 

Physics Department, Stanford University 

Quantum Measurement 

International Business Machines, 
Research Laboratories 

General Relativity 

California Institute of Technology 
University of Texas at Austin 

Electromagnetic/Basic Research 
Electronetics Corp 
Battelle Corp 

Institute for Advanced Study 


Princeton, NJ 
New York, NY 


Cambridge, MA 
Stanford, CA 


Cambridge, MA 
Los Altos, CA 


Davis, CA 
Cambridge, MA 


Rochester , NY 
Stanford, CA 


College Park, MD 


Pasadena , CA 
Austin, TX 


Buffalo, NY 
Columbus, OH 
Austin, TX 
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RESOURCE LITERATURE 
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1. A.R.E. Journal 

2. Abnormal hypnotic Phenomena 

3 . American Anthropologist 

4 . American Ethnologist 

5. American Journal of Clinical Hypnosis 

6. American Journal of Physiology 

7. American Journal of Sociology 

8 . American Psychologist 

9. American Society for Psychical Research 

10. Annals of Eugenics 

11. Annals of Mathematical Statistics 

12 . Annales de Sciences Psychiques 

13 . Archivo di Psicologica Neurologia e Psychiatra 

14. Association for the Anthropological Study of Consciousness 
Newsletter 

15. Behavioral and Brain Science 

16. Behavioral Science 

17. Bell System Technical Journal 

18 . Biological Psychiatry 

19. Biological Review 

20. British Journal for the Philosophy of Science 

21. British Journal of Psychology 

22. Bulletin of the American Physical Research 

23. Bulletin of the Boston Society for Psychic Research 

24. Bulletin of the Los Angeles Neurological Societies 

25. Contributions to Asian Studies 

26. Electroencephalography and Clinical Neurophysiology 

27. Endeavour 

28. Ethnology 

29. Exceptional Human Experience 
3 0 . Exper ientia 

31. Experimental Medicine and Surgery 

32. Fate 

33. Fields within Fields 

34. Foundations of Physics 

35. Hibbert Journal 

36. Human Biology . , 

37. International Journal of Clinical and Experimental Hypnosis 

38. International Journal of Comparative Sociology 
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39. International Journal of Neuropsychiatry 

40. International Journal of Parapsychology 

41. International Journal of Psychoanalysis 

42. Journal of Abnormal and Social Psychology 

43. Journal of Altered States of Consciousness 

44. Journal of Applied Physics 

45. Journal of Applied Psychology 

46. Journal of Asian and African Studies 

47. Journal of Biophysical and Biochemical Cytology 

48. Journal of Cell Biology 

49. Journal of Communication 

50. Journal of Comparative and Physiological Psychology 

51. Journal of Consulting Psychology 

52. Journal of Existential Psychiatry 

53. Journal of Experimental Biology 

54. Journal of Experimental Psychology 

55. Journal of General Psychology 

56. Journal of Genetic Psychology 

57 . Journal of Mind and Behavior 

58 . Journal of Nervous and Mental Diseases 

59. Journal of Personality 

60. Journal of Personality and Social Psychology 

61. Journal of Research in PSI Phenomena 

62. Journal of Scientific Exploration 

63. Journal of the American Academy of Psychoanalysis 

64 . Journal of the London Mathematical Society 

65. Journal of the Royal Anthropological Institute of Great 
Britain and Ireland 

66. Metapsichica 

67. Mind-Brain Bulletin 

68. Motivation and Emotion 

69 . Nature 

70. Naturwissenschaftliche Rundschau 

71. New Horizons 

72. New Scientist 

73. New Sense bulletin 

74. Newsletter of the Parapsychology Foundation 

75. Parapsychology Bulletin 

76. Parapsychology Abstracts International 

77. Parapsychology Review 

78. Perceptual and Motor Skills 

79. Philosophy of Science 

80. Physiology and Behavior 

81. Proceedings of the Society for Psychical Research 

82. Psychedelic Review 

83. Psychic 
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Psychic Science 
Psychoanalytic Quarterly 
Psychoanalytic Review 
Psychological Bulletin 
Psychometrika 
Psychophysiology 

Physics Today _____ 

Renti Teyigongneng (EFHB Research) [PRC] 

Revue Metapsychique 
Revue Philosophique 

Revue Philosophique de la France et de L'Etranger 

Revue Philosophique Applique 

Science 

Skeptical Inquirer 
Social Studies of Science 
Subtle Energies 

The Humanistic Psychology Institute 

The Journal of Parapsychology . , „ . 

The Journal of the American Society for Psychical Research 

Theta 

Tijdschrif voor Parapsychologie 

Vopr osy W Fi losof i (Questions of Philosophy) [RUSSIA] 

Western Canadian Journal of Anthropology 
Zeitschrift fur die Gesamte Neurologie und Psychiatne 
Zietschrift fur Parapsychologie und Grenzgebeite der 
Psychologie 

Zietschrift fur Tierpsychologie 
Zietschrift fur Vergleichende Physiologie 

Zetetic Scholar . x 

Zhongguo Shebui Kexue (China Social Sciences) [PRC] 

Ziran Zazhi (Nature) [PRC] 
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APPENDIX E 

CURRENT CONTRACTOR SCIENTIFIC OVERSIGHT COMMITTEE MEMBERSHIP 


Steven A. Hillyard 

- Professor of Neurosciences , Department of Neurosciences, 
University of California, San Diego. 

- Author or coauthor of 118 technical neuroscience 

publications. , . 

- Eighty-two invited presentations at technical conferences. 

- Ph.D., Yale University, 1968 (Psychology). 

S. James Press . , , „ . . . 

- Professor of Statistics, Department of Statistics, University 

of California, Riverside. 

- Author or coauthor of 132 statistics publications. 

- Author of 12 books and/or monographs. 

- Ph.D., Stanford University, 1964 (Statistics). 

Garrison Rapmund ^ ^ ^ 

- Responsible for facilitating transfer of Strategic 
Defense Initiative technologies to health care industries. 

- Major General, USA retired in 1986 as Assistant Surgeon 
General (R&D) and Commander, Army Medical R & D Command. 

- M.D., Columbia University, 1953 (Pediatrics). 

Melvin Schwartz „ 

- Associate Director for High Energy and Nuclear Physics, 

Brookhaven National Laboratory . . 

- Author or coauthor of 40 technical publications in high energy 
physics, author of ’’Principles of Electrodynamics." 

- Nobel Prize. Physics (1988) . 

- Ph.D., Columbia University, 1958 (Physics). 

Yervant Terzian . . 

- Professor of Physical Sciences, Chairman of the Department of 

Astronomy, Cornell University. 

- Author /coauthor of numerous technical publications and booKs. 

- Ph.D., Indiana University, 1965 (Astronomy). 

Phillip G. Zimbardo „ ^ _ , 

- Professor of Psychology, Department of Psychology, Stanford 

University. , 

- Author /coauthor of numerous experimental psychology 
publications. 

- Ph.D., Yale University, 1959 (Psychology). 
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APPENDIX F 

CURRENT CONTRACTOR INSTITUTIONAL REVIEW BOARD MEMBERSHIP 


Byron wa. Brown, Jr., Ph.D. 

- Biostatistics, Stanford University 

Gary R. Fujimoto, M. D. 

■> - Occupational Medicine, Palo Alto Medical Foundation 

John Hanley, M. D. 

- Neuropsychiatry, University of California, Los Angeles 

Robert B. Livingston, M. D. 

- Neuroscience, University of California, San Diego 

Robin P. Michelson, M. D. 

- Otolaryngology, University of California, San Francisco 

Ronald Y. Nakasone, Ph.D. , , 

4 - Buddhist Studies, Institute of Buddhist Studies, Berkeley, CA 

Garrison Rapmund, M. D. (Chair) 

- Air Force Science Advisory Board 

Louis J. West, M. D. 

— Neuropsychiatry, University of California, Los Angeles 
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Psychological Bulletin (January, 1994) 


Does Psi Exist? 

Replicable Evidence for an 
Anomalous Process of Information Transfer 


Version 4.7 
October 1, 1993 


Daryl J. Bern and Charles Honorton 

Moat academic psychologists do not yet accept the existence ofpai, anomalous processes of in- 
formation or energy transfer (such as telepathy or other forma of extrasensory perception) that 
are currently unexplained in terms of known physical or biological mechanisms. We believe 
that the ropheation rates and effect dies achievedbyone particular experimental method, the 
ganzfeld procedure, are now sufEdent to warrant bringing this body of data to the attention of 
the wider psychological community. Competing meta-analyses of the ganzfeld database are re- 
viewed, 1 by R. Hyman (1985), a skeptical critic of psi research, and the other by C. Honorton 
(1985), a parapsychologist and major contributor to the ganzfeld database: Next the results of 
11 new ganzfeld studies tfiat comply with guidelines jointly authored by R. Hyman and C. 
Honorton (1986) are summarized* Finally, issues of replication and theoretical explanation are 
discussed- 


a The term psi denotes anomalous processes of informa- 
tion or energy transfer, processes such as telepathy or 
other forms of extrasensory perception that are currently 
unexplained in terms of known physical or biological 
mechanisms- The term is purely descriptive: It neither 
implies that such anomalous phenomena are paranormal 
nor connotes anything about their underlying mecha- 
nisms. 

Does psi exist? Most academic psychologists don’t think 
so. A survey of more than 1,100 college professors in the 
United States found that 55% of natural scientists, 66% of 
social scientists (excluding psychologists), and 77% of aca- 
demics in the arts, humanities, and education believed 
that ESP is either an established fact or a likely possibil- 
ity. The comparable figure for psychologists was only 34%. 
Moreover, an equal number of psychologists declared ESP 
to be an impossibility, a view expressed by only 2% of all 
other respondents (Wagner & Monnet, 1979). 


Daryl J. Bern, Department of Psychology, Cornell University; 
Charles Honorton, Department of Psychology, University of Ed- 
inburgh, Edninburgh, Scotland. 

Sadly, Charles Honorton died of a heart attack on November 4, 
1992, 9 days before this article was accepted for publication. He 
was 46. Parapsychology has lost one of its most valued contribu- 
tors. I have lost a valued friend. 

This collaboration had its origins in a 1983 visit I made to 
Honorton’s Psychophysical Research Laboratories (PRL) in 
Princeton, New Jersey, as one of several outside consultants 
brought in to examine the design and implementation of the ex- 
perimental protocols. 

Preparation of this article was supported. In part, by grants to 
Charles Honorton from the American Society for Psychical Re- 
search and the Parapsychology Foundation, both of New York 
City. The work at PRL summarized in the second Half of this ar- 
ticle was supported by the James S. McDonnell Foundation of St. 
Louis, Missouri, and by the John E. Fetzer Foundation of Kala- 
mazoo, Michigan. 

Helpful comments on drafts of this article were received from 
Deborah Delanoy, Edwin May, Donald McCarthy, Robert Morris, 
John Palmer, Robert Rosenthal, Lee Ross, Jessica Utts, Philip 
Z i mb ardo, and two anonymous reviewers. 

Correspondence concerning this article should be addressed to 
Daryl J. Bern, Department of Psychology, Uris Hall, Cornell 
University, Ithaca, New York 14853. (Electronic mail may be 
sent to dLbem^comelljedu). 


Psychologists are probably more skeptical about psi for 
several reasons. First, we believe that extraordinary 
claims require extraordinary proof. And although our col- 
leagues from other disciplines would probably agree with 
this dictum, we are more likely to be familiar with the 
methodological and statistical requirements for sustaining 
such claims, as well as with previous claims that failed ei- 
ther to meet those requirements or to survive the test of 
successful replication. Even for ordinary claims, our con- 
ventional statistical criteria are conservative. The sacred 
P «* .05 threshold is a constant reminder that it is far more 
sinful to assert that an effect exists when it does not (the 
Type I error) than to assert that an effect does not exist 
when it does (the type H error). 

Second, most of us distinguish sharply between phe- 
nomena whose explanations are merely obscure or contro- 
versial (e.g^ hypnosis) and phenomena such as psi that 
would appear to fall outside our current explanatory 
framework altogether. (Some would characterize this as 
the difference between the unexplained and the inexplica- 
ble.) In contrast, many laypersons treat all exotic psycho- 
logical phenomena as epistemologically equivalent; many 
even consider ddji vu to be a psychic phenomenon. The 
blurring of this critical distinction is aided and abetted by 
the mass media, "new age" books and mind-power courses, 
and "psychic* entertainers who present both genuine hyp- 
nosis and fake "mind reading" in the course of a single 
performance. Accordingly, most laypersons would not 
have to revise their conceptual model of reality as radi- 
cally as we would , to assimilate the existence of psi. For 
us, psi is simply more extraordinary. 

Knally, research in cognitive and social psychology has 
sensitized us to the errors and biases that plague intuitive 
attempts to draw valid inferences from the data of every- 
day experience (Gilovich, 1991; Nisbett & Roes, 1980; 
Tversky *& Kahneman, 1971). Tliis leads us to give virtu- 
ally no probative weight to anecdotal or journalistic re- 
ports of psi, the main source cited by our academic col- 
leagues. as evidence for their beliefs about psi (Wagner & 
Monnet, 1979). 

Ironically, however, psychologists are probably not more 
familiar than others with recent experimental research on 
psi. Like most psychological research, parapsychological 
research is reported primarily in specialized journals; un- 
like most psychological research, however, contemporary 
parapsychological research is not usually reviewed or 
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Meta-Analyses of the Ganzfeld Database 

In 1985 and 1986, the Journal of Parapsychology de- 
voted two entire issues to a critical examination of the 
ganzfeld database. The 1985 issue comprised two contri- 
butions: (a) a meta-analysis and critique by Ray Hyman 
(1985), a cognitive psychologist and skeptical critic of 
parapaychological research, and (b) a competing meta- 
analysis and rejoinder by Charles Honorton (1985), a 
parapsychologist find mujor contributor to the ganzfeld 
database. The 1986 issue contained four commentaries on 
the Hyman-Honorton exchange, a joint communique by 
Hyman and Honorton, and six additional commentaries 
on the joint communique itself. We summarize the major 
issues and conclusions here. 

Replication Rates 

Rates by study . Hyman's meta-analysis covered 42 psi 
ganzfeld studies reported in 34 separate reports written 
or published from 1974 through 1981. One of the first 
problems he discovered in the database was multiple 
analysis. As noted earlier, it is possible to calculate sev- 
eral indexes of psi performance in a ganzfeld experiment 
and, furthermore, to subject those indexes to several kinds 
of statistical treatment. Many investigators reported mul- 
tiple indexes or applied multiple statistical tests without 
adjusting the criterion significance level for the number of 
tests conducted. Worse, some may have Chopped" among 
the alternatives until finding one that yielded a signifi- 
cantly successful outcome. Honorton agreed that this was 
a problem. 

Accordingly, Honorton applied a uniform test on a 
common index across all studies from which the pertinent 
datum could be extracted, regardless of how the investiga- 
tors had analyzed the data in the original reports. He se- 
lected the proportion of hits as the common index because 
it could be calculated for the largest subset of studies: 28 
of the 42 studies. The hit rate is also a conservative index 
because it discards most of the rating information; a sec- 
ond place ranking — a “near miss* — receives no more 
credit than a last place ranking. Honorton then calculated 
the exact binomial probability and its associated z score 
for each study. 

Of the 28 studies, 23 (82%) had positive z scores (p « 
4.6 x ICH 4 , exact binomial test with p =. q s J5). Twelve of 
the studies (43%) had z scores that were independently 
significant at the 5% level (p m 3.5 x 10” 9 , binomial test 
with 28 studies, p *= .05, and q *= .95), and 7 of the studies 
(25%) were independently significant at the 1% level (p = 
9.8 x Hr* 9 ). The composite Stouffer z score across the 28 
studies was 6.60 (p m 2.1 x 10~^).l A more conservative 
estimate of significance can be obtained by including 10 
additional studies that also used the relevant judging pro- 
cedure but did not report hit rates. If these studies are as- 
signed a mean z score of zero, the Stouffer z across all 38 
studies becomes 5.67 (p = 7.3 x 10 -9 ). 

Thus, whether one considers only the studies for which 
the relevant information is available or includes a null es- 
timate for the additional studies for which the information 
is not available, the aggregate results cannot reasonably 


Stauffer's z is computed by dividing the sum of the z scores for 
the individual studies by the square root of the number of studies 
(Rosenthal. 1978). 


be attributed to chance. And, by design, the cumulative 
outcome reported here cannot be attributed to the infla- 
tion of significance levels through multiple analysis. 

Rates by laboratory. One objection to estimates such as 
those just described is that studies from a common labora- 
tory are not independent of one another (Parker, 1978). 
Thus, it is possible for one or two investigators to be dis- 
proportionately responsible for a high replication rate 
whereas other, independent investigators are unable to 
obtain the effect 

The ganzfeld database is vulnerable to this possibility. 
The 28 studies providing hit rate information were con- 
ducted by investigators in 10 different laboratories. One 
laboratory contributed 9 of the studies, Honorton’s own 
laboratory contributed 6, 2 other laboratories contributed 
3 each, 2 contributed 2 each, and the remaining 4 labora- 
tories each contributed 1. Thus, half of the studies were 
conducted by only 2 laboratories, 1 of them Honorton's 
own. 

Accordingly, Honorton calculated a separate Stouffer z 
score for each laboratory. Significantly positive outcomes 
were reported by 6 of the 10 laboratories, and the com- 
bined z score across laboratories was 6.16 (p *= 3.6 x 
10~ 10 ). Even if all of the studies conducted by the 2 most 
prolific laboratories are discarded from the analysis, the 
Stouffer z across the 8 other laboratories remains signifi- 
cant (z r= 3.67, p = 1.2 x 10 4 ). Four of these studies are 
significant at the 1% level (p «= 9.2 x 10“* binomial test 
with 14 studies, p *= .01, and q *= .99), and each was con- 
tributed by a different laboratory. Thus, even though the 
total number of laboratories in this database is small, 
most of them have reported significant studies, and the 
significance of the overall effect does not depend on just 
one or two of them. 

Selective Reporting 

In recent years, behavioral scientists have become in- 
creasingly aware of the “file-drawer* problem: the likeli- 
hood that successful studies are more likely to be pub- 
lished than unsuccessful studies, which are more likely to 
be consigned to the file drawers of their disappointed in- 
vestigators (Bozarth & Roberts, 1972; Sterling, 1959). 
Parapsychologists were among the first to become sensi- 
tive to the problem, and, in 1975, the Parapaychological 
Association Council adopted a policy opposing the selec- 
tive reporting of positive outcomes. As a consequence, 
negative findings have been routinely reported at the as- 
sociation's meetings and in its affiliated publications for 
almost two decades. As has already been shown, more 
than half of the ganzfeld studies included in the meta- 
analysis yielded outcomes whose significance falls short of 
the conventional .05 level. 

A variant of the selective reporting problem arises from 
what Hyman (1985) has termed the “retrospective study." 

An investigator conducts a small set of exploratory trials. 

If they yield null results, they remain exploratory and 
never become part of the official record; if they yield posi- 
tive results, they are defined as a study after the fact and 
are submitted for publication. In support of this possibil- 
ity, Hyman noted that there are more significant studies 
in the database with fewer than 20 trials than one would 
expect under the assumption that, all other things being 
equal, statistical power should increase with the square 
root of the sample size. Although Honorton questioned the 
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In p si ganzfeld studies, the hit rate itself provides a 
straightforward descriptive measure of effect size, but this 
measure cannot be compared directly across studies be- 
cause they do not all use a four-stimidua judging set and, 
hence, do not all have a chance baseline of .25. The next 
moat obvious candidate, the difference in each study be- 
tween the hit rate observed and the hit rate expected un- 
der the null hypothesis, is also intuitively descriptive but 
is not appropriate for statistical analysis because not all 
differences between proportions that are equal are equally 
detectable (e.g M the power to detect the difference between 
.55 and .25 is different from the power to detect the differ- 
ence between .50 and .20). 

To provide a scale of equal detectability, Cohen (1988) 
devised the effect size index h, which involves an arcsine 
transformation on the proportions before calculation of 
their difference. Cohen's h is quite general and can assess 
the difference between any two proportions drawn from 
independent samples or between a single proportion and 
any specified hypothetical value. For the 28 studies exam- 
ined in the meta-analyses, h was .28, with a 95% confi- 
dence interval from .11 to .45. 

But because values of h do not provide an intuitively 
descriptive scale, Rosenthal and Rubin (1989; Rosenthal, 
1991) have recently suggested a new index, it, which ap- 
plies specifically to one-sample, multi pie -choice data of 
the kind obtained in ganzfeld experiments. In particular, 
n expresses all hit rates as the proportion of hits that 
would have been obtained if there had been only two 
equally likely alternatives— essentially a coin flip. Thus, it 
ranges from 0 to 1, with .5 expected under the null hy- 
pothesis. The formula is 

fr _ P(k-l) 

P(k - 2) + 1* 

where P is the raw proportion of hits and k is the number 
of alternative choices available. Because it has such a 
straightforward intuitive interpretation, we use it (or its 
conversion back to an equivalent four-alternative hit rate) 
throughout this article whenever it is applicable. 

For the 28 studies examined in the meta-analyses, the 
mean value of Trwas .62, with a 95% confidence interval 
from .55 to .69. This corresponds to a four-alternative hit 
rate of 35%, with a 95% confidence interval from 28% to 
43%. 

Cohen (1988, 1992) has also categorized effect sizes into 
small , medium, and large, with medium denoting an effect 
size that should be apparent to the naked eye of a careful 
observer. For a statistic such as n, which indexes the de- 
viation of a proportion from .5, Cohen considers .65 to be a 
medium effect sizer A statistically unaided observer 
should be able to detect the bias of a coin that comes up 
heads on 65% of the trials. Thus, at .62, the psi ganzfeld 
effect size falls just short of Cohen’s naked-eye criterion. 
From the phenomenology of the ganzfeld experimenter, 
the corresponding hit rate of 35% implies that he or she 
will see a subject obtain a hit approximately eveiy third 
session rather than every fourth. 

It is also instructive to compare the psi ganzfeld effect 
with the results of a recent medical study that sought to 
determine whether aspirin can prevent heart attacks 
(Steering Committee of the Physicians’ Health Study Re- 
search Group, 1988). The study was discontinued after 6 


years because it was already clear that the aspirin treat- 
ment was effective (p < .00001) and it was considered un- 
ethical to keep the control group on placebo medication 
The study was widely publicized as a major medical 
breakthrough. But despite its undisputed reality and 
practical importance, the size of the aspirin effect is quite 
email: Taking aspirin reduces the probability of suffering 
a heart attack by only .008. The corresponding effect Bize 
(A) is .068, about one third to one fourth the size of the psi 
ganzfeld effect (Atkinson et al., 1993, p. 236; Utts, 1991b). 

In sum, we believe that the psi ganzfeld effect is large 
enough to be of both theoretical interest and potential 
practical importance. 

Experimental Correlates of the Psi Ganzfeld Effect 

We showed earlier that the technique of correlating 
variables with effect sizes across studies can help to as- 
sess whether methodological flaws might have produced 
artifactual positive outcomes. The same technique can be 
used more affirmatively to explore whether an effect 
vanes systematically with conceptually relevant varia- 
tions in experimental procedure. The discovery of such 
correlates can help to establish an effect as genuine, sug- 
gest ways of increasing replication rates and effect sizes, 
and enhance the chances of moving beyond the simple 
demonstration of an effect to its explanation. This strat- 
egy is only heuristic, however. Any correlates discovered 
must be considered quite tentative, both because they 
emerge from post hoc exploration and because they neces- 
sarily involve comparisons across heterogeneous studies 
that differ simultaneously on many interrelated variables, 
known and unknown. Two such correlates emerged from 
the meta-analyses of the psi ganzfeld effect. 

Single- versus multiple-image targets. Although most of 
the 28 studies in the meta-analysis used single pictures as 
targets, 9 (conducted by three different investigators) 
used View Master stereoscopic slide reels that presented 
multiple images focused on a central theme. Studies using 
the View Master reels produced significantly higher hit 
rates than did studies using the single-image targets (50% 
vs. 34%), t (26) = 2.22, p «= .035, two-tailed. 

Sender-receiver pairing. In 17 of the 28 studies, partici- 
pants were free to bring in friends to serve as senders. In 
8 studies, only laboratory-assigned senders were used. 
(Three studies used no sender.) Unfortunately, there is no 
record of how many participants in the former studies ac- 
tually brought in friends. Nevertheless, those 17 studies 
(conducted by six different investigators) had significantly 
higher hit rates than did the studies that used only labo- 
ratory-assigned senders (44% vs. 26%), <(23) = 2.39 p = 
.025, two-tailed. P 

The Joint Communique 

After their published exchange in 1985, Hyman and 
HonortOn agreed to contribute a joint communique to the 
subsequent discussion that was published in 1986. First 
they set forth their areas of agreement and disagreement: 

We agree that there is an overall significant effect in this 
data base that cannot reasonably be explained by selective 
reporting or multiple analysis. We continue to differ over 
the degree to which the effect constitutes evidence for psi, 
but we agree that the final verdict awaits the outcome of fu- 
ture experiments conducted by a broader range of investiga- 
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Randomization . The random selection of the target and 
sequendng of the judging set were controlled by a noiae- 
based random number generator interfaced to the com- 
puter. Extensive testing confirmed that the generator was 
providing a uniform distribution of values throughout the 
full target range (1-160). Testa on the actual frequencies 
observed during the experiments confirmed that targets 
were, on average, selected uniformly from among the 4 
clips within each target set and that the 4 judging se- 
quences used were uniformly distributed across sessions. 

Additional control features. The receiver's and sender's 
rooms were sound-isolated, electrically shielded chambers 
with single-door access that could be continuously moni- 
tored by the experimenter. There was two-way intercom 
communication between the experimenter and the re- 
ceiver but only one-way communication into the sender’s 
room; thus, neither the experimenter nor the receiver 
coidd monitor events inside the sender’s room. The 
archival record for each session includes an audiotape 
containing the receiver’s mentation during the ganzfeld 
period and all verbal exchangee between the experimenter 
and the receiver throughout the experiment. 

The automated ganzfeld protocol has been examined by 
several dozen parapsychologists and behavioral re- 
searchers from other fields, including well-known critics 
of parapsychology. Many have participated as subjects or 
observers. All have expressed satisfaction with the han- 
dling of security issues and controls. 

Parapsychologists have often been urged to employ ma- 
gicians as consultants to ensure that the experimental 
protocols are not vulnerable either to inadvertent sensory 
leakage or to deliberate cheating. Two *mentalists," magi- 
cians who specialize in the simulation of psi, have exam- 
ined the autoganzfeld system and protocol. Ford Kroos, a 
professional mentaliet and officer of the mentalisfs pro- 
fessional organization, the Psychic Entertainers Associa- 
tion, provided the following written statement *In my pro- 
fessional capacity aa a mentalist, I have reviewed Psy- 
chophysical Research Laboratories’ automated ganzfeld 
Bystem and found it to provide excellent security against 
deception by subjects" (personal communication. May, 
1989). * 

Daryl J- Bern has also performed as a mentalist for 
many years and is a member of the Psychic Entertainers 
Association. As mentioned in the author note, this article 
had its origins in a 1983 visit he made to Honorton’s labo- 
ratory, where he was asked to critically examine the re- 
search protocol from the perspective of a men tali st, a re- 
search psychologist, and a subject. Needless to say, this 
article would not exist if he did not concur with Ford 
Kross’s assessment of the security procedures. 

Experimental Studies 6 

Altogether, 100 men and 140 women participated as re- 
ceivers in 354 sessions during the research p r og ram Hie 
participants ranged in age from 17 to 74 years (m «= 37 . 3 , 

SD = 11.8), with a mean formal education of 15.6 years 
(SD = 2.0). Eight separate experimenters, including Hon- 
orton, conducted the studies. 


6 A recent review of the original computer files uncovered a 
duplicate record in the autoganzfeld database. This has now been 
eliminated, reducing by one the number of subjects and sessions. 
As a result, some of the numbers presented in this article differ 
slightly from those in Honorton et al. (1990). 


The experimental program included three nilot 
eight formal studies. Five of the formal studies i^d 
novice (first-time) participants who served as the receiver 
in one session each. The remaining three formal Btudies 
used experienced participants. 

Pilot studies. Sample sizes were not preset in the three 
pilot studies. Study 1 comprised 22 sessions and was con- 
ducted dunng the initial development and testing of the 
autoganzfeld system. Study 2 comprised 9 sessions testing 
a procedure m which the experimenter, rather than the 
receiver, served as the judge at the end of the session, 
btudy 3 comprised 35 sessions and served as practice for 
participants who had completed the allotted number of 
sessions in the ongoing formal studies but who wanted 
additional ganzfeld experience. This Btudy also included 
several demonstration sessions when TV film crews were 
present. 


Novice Studies. Studies 101-104 were each designed to 
test 60 participants who had had no prior ganzfeld experi- 
ence; each participant served as the receiver in a single 
ganzfeld session. Study 104 included 16 of 20 students re- 
cruited from the Juilliard School in New York City to test 
an artistically gifted sample. Study 105 was initiated to 
accommodate the overflow of participants who had been 
recruited Tor Study 104, including the four remaining Juil- 
hard students. The Bample size for this study was set to 
25, but only 6 sessions had been completed when the labo- 
ratoiy closed. For purposes of exposition, we divided the 
56 sessions from Studies 104 and 105 into two parts: 
Study 104/1 05(a) comprises the 36 non-Jmlliard partici- 
pants and Study 104/105(b) comprises the 20 Juilliard 
students. 


Stu dy 201* T3us study was designed to retest the most 
promising participants from the previous studies. The 
number of trials was set to 20, but only 7 sessions with 3 
participants had been completed when the laboratory 
close cl 

Study SOL This study was designed to compare static 
and dynamic targets. The sample size was set to 60 ses- 
sions. Twenty-five experienced participants each served 
as the receiver in 2 sessions. Unknown to the participants, 
ffie computer control program was modified to ensure that 
they would each have 1 session with a static target and 1 
session with a dynamic target. 

Study 302. This study was designed to examine a dy- 
namic target set that had yielded a particularly high hit 
rate in the previous studies. The study involved experi- 
enced participants who had had no prior experience with 
this particular target set and who were unaware that only 
one target set was being sampled. Each served as the re- 
ceiver in a single session. The design called for the study 
to continue until 15 sessions were completed with each of 
the targets, but only 25 sessions had been completed 
when the laboratory dosed. 

II studies just described comprise all sessions con- 
cocted during the 6.6 years of the program. There is no 
ule drawer" of unreported sessions. 


Results 


Overall hit rate . As in the earlier meta-analysis, re- 
ceivers ratings were analyzed by tallying the proportion 
of hits achieved and calculating the exact binomial proba- 
bility for the observed number of hits compared with the 
chance expectation of .25. As noted earlier, 240 partici- 
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Table 2 


Study 302: Expected Hit Rate and Proportion of Sessions in which Each \/ideo Clip was Ranked First when It was a Tamet „nH 
when it was a Decoy ana 


Video Clip 

Relative 
Frequency 
as Target 

Relative 
Frequency of 
First Flace 
Ranking 

Tidal Wave 

.28 

24 


(7/25) 

(6/25) 

Snakes 

.12 

.12 


(3/25) 

(3/25) 

Sex Scene 

.16 

.08 


(4/25) 

(2/25) 

Bugs Bunny 

.44 

.56 


(11/25) 

(14/25) 


Expected 
Hit Rate (%) 


Ranked First Ranked First 
s when 1 when 
Target Decoy 


.11 

(2/18) 

.05 

( 1 / 22 ) 

.05 

( 1 / 21 ) 

.36 

(5/14) 


.82 

<9/11) 


Difference 


Fisher’s 

P 


Overall 


sessions of a study are more successful than later sea- 
sons. If there were such an effect, then studies with fewer 
sessions would show larger effect sizes because they 
would end before a decline could set in. To check this pos- 
sibility, we imputed point-biserial correlations between 
hits (1) or misses (0) and the session number within each 
of the 10 studies. All of the correlations hovered around 
zero; six were positive, four were negative, and the overall 
mean was .01. 

An inspection of Table 1 reveals that the negative corre- 
lation derives primarily from the two studies with the 
largest effect sizes: the 20 sessions with the Juilliard stu- 
dents and the 7 sessions of Study 201^ the study ^specifi- 
cally designed to retest the most ipromising participants 
from the previous studies. Accordingly, it seems likely 
that the larger effect sizes of these two studies— and 
hence the significant negative correlation between the 
number of sessions and the effect size — reflect genuine 
performance differences between these two small, highly 
selected samples and other autoganzfeld participants. 

Study 302 . All of the studies except Study 302 randomly 
sampled from a pool of 160 static and dynamic targets. 
Study 302 sampled from a single, dynamic target set that 
had yielded a particularly high hit rate in the previous 
studies. The four film clips in this set consisted of a scene 
of* tidal wave from the movie Clash of the Titans , a high- 
speed sex scene from A Clockwork Orange , a scene of 
crawling snakes from a TV documentary, and a scene 
from a Bugs Bunny cartoon. 

Hie experimental design called for this study to con- 
tinue until each of the clips had served as the target 15 
times. Unfortunately, the premature termination of this 
study at 25 sessions left an imbalance in the frequency 
with which each dip had served as the target. This means 
that the high hit rate observed (64%) could well be in- 
flated by response biases. 

As an illustration, water imagery is frequently reported 
by receivers in ganzfeld sessions whereas sexual imagery 
is rarely reported. (Some participants are probably reluc- 


tant both to report sexual imagery and to give the highest 
rating to the sex-related clip.) If a video clip containing 
popular imagery <such as water) happens to appear as a 
target more frequently than a dip containing unpopular 
imagery (such as sex), a high hit rate might simply reflect 
the coinddence of those frequendes of occurrence with 
partidpants’ response biases. And, as ^he second column 
of Table 2 reveals, the ’tidal wave dip did in fact appear 
more frequently as the target than did the sex clip. More 
generally, the second and third columns of Table 2 show 
that the frequency with which each film clip was ranked 
first closely matches the frequency with which each ap- 
peared as the target. 

One can adjust for this problem by using the observed 
frequendes in these two columns to compute the hit rate 
expected if there were no psi effect, hi particular, one can 
multiply each proportion in the second column by the cor- 
responding proportion s the third column-yielding the 
joint probability that the dip was the target and that it 
was ranked first— and then sum across the four clips. As 
shown in the fourth column of Table 2, this computation 
yields an overall expected hit rate of 34.08%. When the 
observed hit rate of 64% is compared with this baseline, 
the effect size (h) is .61. As shown in Table 1, this is 
equivalent to a four-alternative hit rate of 54%, orajr 
value of .78, and is statistically significant (z e 3.04. d *= 
. 0012 ). 

Hie psi effect can be seen even more clearly in the re- 
mainingcolumns of Table 2, which control for the differ- 
ential popularity of the imagery in the dips by displaying 
how frequently each was ranked first when it was the tar- 
get compared with how frequently it was ranked first 
when it was one of the control dips (decoys). As can be 
seen, each of the four clips was selected as the torget rel- 
atively more frequently when it was the target than when 
it was a decoy, a difference that is significant for three of 
the four clips. On average, a clip was identified as the tar- 
get 58% of the time when it was the target and only 14% 
of the time when it was a decoy. 
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ory ( z e 2.23, p < .05, two-tailed). You now have cause to run 
an additional group of 10 subject*. What do you think the 
probability is that the results will be significant, by a one- 
tailed test, separately for this group? (p. 105) 

The median estimate was .85, with 9 out of 10 respon- 
dents providing an estimate greater than .60. The correct 
answer is approximately .48. 

As Rosenthal (1990) has warned: "Given the levels of 
statistical power at which we normally operate, we have 
no right to expect the proportion of significant results that 
we typically do expect, even if in nature there is a very 
real and very important effect* (p, 16). In this regard, it is 
again instructive to consider the medical study that found 
a highly significant effect of aspirin on the incidence of 
heart attacks. The study monitored more than 22,000 
subjects. Had the investigators monitored 3,000 subjects, 
they would have had less than an even chance of finding a 
conventionally significant effect Such is life with small ef- 
fect sizes. 

Given its larger effect size, the prospects for success- 
fully replicating the psi ganzfeld effect are not quite so 
daunting, but they are probably still grimmer than intu- 
ition would suggest If the true hit rate is in fact about 
34% when 25% is expected by chance, then an experiment 
with 30 trials (the mean for the 28 studies in the original 
meta-analysis) has only about 1 chance in 6 of finding an 
effect significant at the .05 level with a one-tailed test A 
50-trial experiment boosts that chance to about 1 in 3. 
One must escalate to 100 trials in order to come close to 
the break even point at which one has a 60-50 chance of 
finding a statistically significant effect (Utts, 1986). 
(Recall that only 2 of the 11 autoganzfeld studies yielded 
results that were individually significant at the conven- 
tional .05 level.) Those who require that a psi effect be 
statistically significant every time before they will seri- 
ously entertain the possibility that an effect really exists 
know not what they ask. 

Significance Versus Effect Size 

The preceding discussion is unduly pessimistic, how- 
ever, because it perpetuates the tradition of worshipping 
the significance level. Regular readers of this journal are 
likely to be familiar with recent arguments imploring be- 
havioral edentists to overcome their slavish dependence 
on the significance level as the ultimate measure of virtue 
and instead to focus more of their attention on effect sizes: 
"Surely, God loves the .06 nearly as much as the .05* 
(Roenow & Rosenthal, 1989, p. 1277). Accordingly, we 
suggest that achieving a respectable effect size with a 
methodologically tight ganzfeld study would be a perfectly 
welcome contribution to the replication effort, no matter 
how untenurable the p level renders the investigator. 

Career consequences aside, this suggestion may seem 
quite counterintuitive. Again, Tversky and Kahneman 
(1971) have provided an elegant demonstration. They 
asked several of their colleagues to consider an investiga- 
tor who runs 15 subjects and obtains a significant t value 
of 2.46. Another investigator attempts to duplicate the 
procedure with the same number of subjects and obtains a 
result in the same direction but with a nonsignificant 
value of t . Tversky and Kahneman then asked their col- 
leagues to indicate the highest level of t in the replication 
study they would describe as a failure to replicate. The 
majority of their colleagues regarded t *= 1.70 as a failure 
to replicate. But if the data from two such studies (t *= 2.46 


and t m 1.70) were pooled, the / for the combined data 
would be about 3.00 (asstiming equal variances): 

Thus, we are faced with a paradoxical state of affairs, in 
which the same data that would increase our confidence in 
the finding when viewed as part of the original study, shake 
our confidence when viewed as an independent study. 
(Tversky & Kahneman, 1971, p. 108) 

Such is the iron grip of the arbitrary .05. Pooling the 
data, of course, is what meta-analysis is all about. Ac- 
cordingly, we suggest that two or more laboratories could 
collaborate in a ganzfeld replication effort by conducting 
independent studies and then pooling them in meta-ana- 
lytic fashion, what one might call real-time meta-analy- 
sis. (Each investigator could then claim the pooled p 
level for his or her own curriculum vitae.) 

Maximizing Effect Size 

Rather than buying or borrowing larger sample sizes, 
those who seek to replicate the psi ganzfeld effect might 
- find it more intellectually satisfying to attempt to maxi- 
mize the effect size by attending to the variables associ- 
ated with successful outcomes. Thus researchers who wish 
to enhance the chances of successful replication should 
use dynamic rather than static targets. Similarly we ad- 
vise using participants with the characteristics we have 
reported to be correlated with successful psi performance. 
Random college sophomores enrolled in introductory psy- 
chology do not constitute the optimal subject pool. 

Finally, we urge ganzfeld researchers to read carefully 
the detailed description of the warm social ambiance that 
Honorton et al. (1990) sought to create in the autoganzfeld 
laboratory. We believe that the social climate created in 
psi experiments is a critical determinant of their success 
or failure. 

The Problem of “Other* Variables 

This caveat about the social climate of the ganzfeld ex- 
periment prompted one reviewer of this article to worry 
that this provided "an escape clause" that weakens the 
falsifiabfiity of the psi hypothesis: "Until Bern and Hon- 
orton can provide operational criteria for creating a 
warm social ambiance, the failure of an experiment with 
otherwise adequate power can always be dismissed as 
due to a lack of warmth." 

Alas, it is true; we devoutly wish it were otherwise. 
But the operation of unknown variables in moderating 
the success of replications is a fact oflife in all of the sci- 
ences. Consider, for example, an earlier article in this 
journal by Spence (1964). He reviewed studies testing 
the straightforward derivation from Hullian learning 
theory that high-arixiety subjects should condition more 
strongly than low-anxiety subjects. This hypothesis was 
confirmed 94% of the time in Spence's own laboratory at 
the University of Iowa but only 63% of the time in labo- 
ratories at other universities. In fact, Kimble and his as- 
sociates at Duke University and the University of North 
Carolina obtained results in the opposite direction in two 
of three experiments. 

In searching for a post hoc explanation, Spence (1964) 
noted that "a deliberate attempt was made in the Iowa 
studies to provide conditions in the laboratory that might 
elicit some degree of emotionality. Thus, the experi- 
menter was instructed to be impersonal and quite formal 
... and did not try to put [subjects] at ease or allay any 
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Crae, 1992), which assesses six different facets of the ex- 
troversion-introversion factor. 

The sender . In contrast to this information about the re- 
ceiver in psi experiments, virtually nothing is known 
about the characteristics of a good sender or about the ef- 
fects of the sender's relationship with the receiver. As has 
been shown, the initial suggestion from the meta-analysis 
of the original ganzfeld database that psi performance 
might be enhanced when the sender and receiver are 
friends was not replicated at a statistically significant 
level in the autoganzfeld studies. 

A number of parapsychologists have entertained the 
more radical hypothesis that the sender may not even be a 
necessary element in the psi process. In the terminology of 
parapsychology, the sender-receiver procedure testa for 
the existence of ' telepathy 9 anomalous communication be- 
tween two individuals; however if the receiver is somehow 
picking up the information from the target itself, it would 
be termed clairvoyance , and the presence of the sender 
would be irrelevant (except for possible psychological rea- 
sons such as expectation effects). 

At the time of his death, Honorton was planning a se- 
ries of autoganzfeld studies that would systematically 
compare sender and no-sender conditions while keeping 
both the receiver and the experimenter blind to the condi- 
tion of the ongoing session. In preparation, he conducted a 
meta-analytic review of ganzfeld studies that used no 
sender. He found 12 studies with a median of 33.5 ses- 
sions, conducted by seven investigators. The overall effect 
size (n) was .56, which corresponds to a four-alternative 
hit rate of 29%. But this effect size does not reach statisti- 
cal significance (Stouffer * = 1.31, p *= .095). So far, then, 
there is no firm evidence for psi in the ganzfeld in the ab- 
sence of a sender. (There are, however, nonganzfeld stud- 
ies m the literature that do report significant evidence for 
clairvoyance, including a classic card-guessing experiment 
conducted by J. B. Rhine and Pratt fl954].) 

The Physics of Psi 

psychological level of theorizing discussed earlier 
does not, of course, address the conundrum that makes psi 
phenomena anomalous in the first place: their presumed 
incompatibility with our current conceptual model of 
physical reality. Parapsychologists differ widely from one 
another in their taste for theorizing at this level, but sev- 
eral whose training lies in physics or engineering have 
proposed physical (or biophysical) theories of psi phenom- 
ena (an extensive review of theoretical parapsychology 
was provided by Stokes, 1987). Only some of these theo- 
nes would force a radical revision in our conception of 
physical reality. 

Those who follow contemporary debates in modem 
physics however, will be aware that several phenomena 
predicted by quantum theory and confirmed by experi- 
ment are themselves incompatible with our current con- 
ceptual model of physical reality. Of these, it is the 1982 
empirical confirmation of Bell's theorem that has created 
1 controversy among philosophers 

and the few physicists who are willing to speculate on 
such matters (Cushing & McMullin, 1989; Herbert, 1987). 

In brief, Bells theorem states that any model of reality i 
hat is compatible with quantum mechanics must be non- 
local: It must allow for the possibility that the results of 
observations at two arbitrarily distant locations can be 
correlated in ways that are incompatible with any physi- 
cally permissible causal mechanism. 


- Sacral poeaible modele of reality that incorporate non- 
locahty have been proposed by both philoBophera and 
physicists. Some of these models clearly rule out psi-like 
i information transfer, others permit it, and some actually 
require it. Thus, at a grander level of theorizing, some 
parapsydiologists believe that one of the more radical 
models of reality compatible with both quantum mechan- 
ics and psi will eventually come to bo accepted. If and 
when that occurs, psi phenomena would cease to be 
anomalous. 

But we have learned that all such talk provokes most of 
our colleagues in psychology and in physics to roll their 
eyes and gnash their teeth. So let’s just leave it at that. 

Skepticism Revisited 

More generally, we have learned that our colleagues’ 
tolerance for any kind of theorizing about psi is strongly 
determined by the degree to which they have been con- 
vinced by the data that psi has been demonstrated. We 
have further learned that their diverse reactions to the 
data themselves are strongly determined by their a priori 
beliefs about and attitudes toward a number of quite gen- 
eral issues, some scientific, some not. In fact, several 
statisticians believe that the traditional hypothesis test- 
ing methods used in the behavioral sciences should be 
abandoned in favor of Bayesian analyses, which take into 
account a person's a priori beliefs about the phenomenon 
under investigation (e.g., Bayarri & Berger, 1991; Daw- 
son, 1991). 

In the final analysis, however, we suspect that both 
one b Bayesian a prioris and one's reactions to the data 
are ultimately determined by whether one was more 
severely punished in childhood for Type I or Type II er- 
rors. 
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Replication and Meta-Analysis in 
Parapsychology 


Jessica titts 


Abstract. Parapsychology, the laboratory study of psychic phenomena, 
has had its history interwoven with that of statistics. Many of the 
controversies in parapsychology have focused on statistical issues, and 
statistical models have played an integral role in the experimental 
work. Recently, parapsychologists have been using meta-analysis as a 
tool for synthesizing large bodies of work. This paper presents an 
overview of the use Of statistics in-parapsychology and offers a summary 
of the meta-analyses that have been conducted. It bepns with some 
anecdotal information about the involvement of statistics and statisti- 
cians with the early history of parapsychology. Next, it is argued that 
most nonstatisticians do not appreciate the connection between power 
and “successful” replication of experimental effects. Returning to para- 
psychology, a particular experimental regime isoxmnined by summara- 
ing an extended debate over the interpretation of the results. A new set 
of experiments designed to resolve the debate is then reviewed. Finally, 
meta-analyses from several areas of parapsychology are summarized. It 
is concluded that the overall evidence indicates that there is an anoma- 
lous effect in need of an explanation. 

Key words and phrases: Effect size, psychic research, statistical contro- 
versies, randomness, vote-counting. 
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1. INTRODUCTION 

In a June 1990 Gallup Poll, 49% of the 1236 
respondents claimed to believe in extrasensory per- 
ception (ESP), and one in four claimed to have had 
a personal experience involving telepathy (Gallup 
and Newport, 1991). Other surveys have shown 
even higher percentages; the University of 
Chicago’s National Opinion Research Center re- 
cently surveyed 1473 adults, of which 67% claimed 
that they had experienced ESP (Greeley, 1987). 

Public opinion is a poor arbiter of science, how- 
ever, .and experience is a poor substitute for the 
scientific method. For more than a century, small 
numbers of scientists have been conducting labora- 
tory experiments to study phenomena such as 
telepathy, clairvoyance and precognition, collec- 
tively known as “psi” abilities. This paper will 
examine some of that work, as well as some of the 
statistical controversies it has generated. 


Jessica Utts is Associate Professor, Division of 


Parapsychology, as this field is called, has been a 
source of controversy throughout its history. Strong 
beliefs tend to be resistant to. change even in the 
face of data, and many people, scientists included, 
seem to have made up their minds on the question 
without examining any empirical data at all. A 
critic of parapsychology recently acknowledged that 
“The level of the debate during the past 130 years 
has been an embarrassment for anyone who would 
like to believe that scholars and scientists adhere 
to standards of rationality and fair play” (Hyman, 
1985a, page 89). While much of the controversy has 
focused on poor experimental design and potential 
fraud, there have been attacks and defenses of the 
statistical methods as well, sometimes calling into 
question the very foundations of probability and 
statistical inference. 

Most of the criticisms have been leveled by psy- 
chologists. For example, a 1988 report of the Uh>. 
National Academy of Sciences concluded that lhe 
committee finds no scientific justification from 
research conducted over a period of 130 years o ^ 

the existence of parapsychologicdl phenomena 
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One of the first American researchers to 
use statistical methods in parapsychology was 
John Edgar Coover, who was the Thomas Welton 
Stanford Psychical Research Fellow in the Psychol- 
ogy Department at Stanford University from 1912 
to 1937 (Dommeyer, 1975). In 1917, Cooyer pub- 
lished a large volume summarizing his work 
(Coover, 1917). Coover believed that his results 
were consistent with chance, but others have ar- 
gued that Coover’s definition of significance was 
too strict (Dommeyer, 1976). For example, in one 
evaluation of his telepathy experiments, Coover 
found a two-tailed p-valueof 0.0062. He concluded, 
“Since this value, then, lies within the field of 
chance deviation, although the probability of its 
occurrence by chance is fairly low, it cannot be 
accepted as a decisive indication of some cause 
beyond chance which operated in favor of success in 
guessing” (Coover, 1917, page 82). On the next 
page, he made it explicit that he would require a 
p-value of 0.0000221 to declare that something 
other than chance was operating. 

It was during the summer of 1930, with the 
card-guessing experiments of J. B. Rhine at Duke 
University, that parapsychology began to take hold 
as a laboratory science. Rhine’s laboratory still 
exists under the name of the Foundation for Re- 
search on the Nature of Man, housed at the edge of 
the Duke University campus. 

It wasn’t long after Rhine published his first 
book. Extrasensory Perception in 1934, that the 
attacks on his methodology began. Since his claims 
were wholly based on statistical analyses of his 
experiments, the statistical methods were closely 
scrutinized by critics anxious to find a conventional 
explanation for Rhine’s positive results. 

The most persistent critic was a psychologist 
from McGill University named Chester Kellogg 
(Mauskopf and McVaugh, 1979). Kellogg’s main 
argument was that Rhine was using the binomial 
distribution (and normal approximation) on a se- 
ries of trials that were not independent. The experi- 
ments in question consisted of having a subject 
guess the order of a deck of 25 cards, with five each 
of five symbols, so technically Kellogg was correct. 

By 1937, several mathematicians and statis- 
ticians had come to Rhine’s aid. Mauskopf and 
McVaugh (1979) speculated that since statistics was 
itself a young discipline, “a number of statisticians 
were equally outraged by Kellogg, whose argu- 
ments they saw as discrediting their profession” 
(page 258). The major technical work, which ac- 
knowledged that Kellogg’s criticisms were accurate 
but did little to change the significance of the 
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and Greenwood, 1937). Stuart, who had been an 
undergraduate in mathematics at Duke, was one of 
Rhine’s early subjects and continued to work with 
him as a researcher until Stuart’s death in 1947. 
Greenwood was a Duke mathematician, who appar- 
ently converted to a statistician at the urging of 
Rhine. 

Another prominent figure who was distressed 
with Kellogg’s attack was E. V. Huntington, a 
mathematician at Harvard. After corresponding 
with Rhine, Huntington decided that, rather than 
further confuse the public with a technical reply to 
Kellogg’s arguments, a simple statement should be 
made to the effect that the mathematical issues in 
Rhine’s work had been resolved. Huntington must 
have successfully convinced his former student, 
Burton Camp of Wesleyan, that this was a wise 
approach. Camp was the 1937 President of IMS. 
When the annual meetings were held in December 
of 1937 (jointly with AMS and AAAS), Camp 
released a statement to the press that read: 

Dr. Rhine’s investigations have two aspects: 
experimental and statistical. On the exper- 
imental side mathematicians, of course, 
have nothing to say. On the statistical side, 
however, recent mathematical work has 
established the fact that, assuming that the 
experiments have been properly performed, 
the statistical analysis is essentially valid. If 
the Rhine investigation is to be fairly attacked, 
it must be on other than mathematical grounds 
[Camp, 1937]. 

One statistician who did emerge as a critic was 
William Feller. In a talk at the Duke Mathemati- 
cal Seminar on April 24, 1940, Feller raised three 
criticisms to Rhine’s work (Feller, 1940). They had 
been raised before by others (and continue to be 
raised even today). The first was that inadequate 
shuffling of the cards resulted in additional infor- 
mation from one series to the next. The second was 
what is now known as the “file-drawer effect,” 
namely, that if one combines the results of pub- 
lished studies only, there is sure to be a bias in 
favor of successful studies. The third was that the 
results were enhanced by the use of optional stop- 
ping, that is, by not specifying the number of trials 
in advance. All three of these criticisms were ad- 
dressed in a rejoinder by Greenwood and Stuart 
(1940), but Feller was never convinced. Even in its 
third edition published in 1968, his book An Intro- 
duction to Probability Theory and Its Applications 
still contains his conclusion about Greenwood and 
Stuart: “Both their arithmetic and their expen- 
Un,,n 4 Jictinpf finffp pf the super nature 
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,o their colleagues at a professional meeting, with 
he question: 

An investigator has reported a result that you 
consider implausible. He ran 15 subjects, and 
reported a significant value, t = 2.46. Another 
investigator has attempted to duplicate his pro- 
cedure, and he obtained a nonsignificant value 
of t with the same number of subjects. The 
direction was the same in both sets of data. 
You are reviewing the literature. What is the 
highest value of t in the second set of data that 
you would describe as a. failure to replicate? 
[1982, page 281. 

h reporting their results, Tversky and Kahne- 
nann stated: 

The majority of our respondents regarded t = 
1.70 as a failure to replicate. If the data of two 
such studies (t — 2.46 and t «= 1.70) are pooled, 
the value of t for the combined data is about 
3.00 (assuming equal variances). Thus, we are 
faced with a paradoxical state of affairs, in 
which the same data that would increase our 
confidence in the finding when viewed as part 
of the original study, shake our confidence 
when viewed as an independent study [1982, 
page 281. 

At a recent presentation to the History and Phi- 
osophy of Science Seminar at the University of 
California at Davis, I asked the following question, 
fwo scientists. Professors. A and B, each, have a 
theory they would like to demonstrate. Each plans 
:o run a fixed number of Bernoulli trials and then 
test H 0 : p = 0.25 versus H a : p > 0.25. Professor A 
las access to large numbers of students each 
semester to use as subjects. In his first experiment, 
ie runs 100 subjects, and there are 33 successes 
[p = 0.04, one-tailed). Knowing the importance of 
replication. Professor A runs an additional 100 sub- 
jects as a second experiment. He finds 36 successes 
[ p = 0.009, one-tailed). 

Professor B only teaches small classes. Each 
quarter, she runs an experiment on her students to 
test her theory. She carries out ten studies this 
way, with the results in Table 1. 

I asked the audience by a show of hands to 
indicate whether or not they felt the scientists had 
successfully demonstrated their theories. Professor 
A’s theory received overwhelming support, with 
approximately 20 votes, while Professor B’s theory 
received only one vote. 

If you aggregate the results of the experiments 
for each professor, you will notice that each con- 
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with 71 as opposed to 69 successful trials. The 
one-tailed p-values for the combined trials are 
0.0017 for Professor A and 0.0006 for Professor B. 

To address the question of replication more ex- 
plicitly, I also posed the following scenario. In 
December of 1987, it was decided to prematurely 
terminate a study on the effects of aspirin in reduc- 
ing heart attacks because the data were so convinc- 
ing (see, e.g., Greenhouse and Greenhouse, 1988; 
Rosenthal, 1990a). The physician-subjects had been 
randomly assigned to take aspirin or a placebo. 
There were 104 heart attacks among the 11,037 
subjects in the aspirin group, and 189 heart attacks 
among the 11,034 subjects in the placebo group 
(chi-square = 25.01, p < 0.00001). 

After showing the results of that study, I pre- 
sented the audience with two hypothetical experi- 
ments conducted to try to replicate the original 
result, with outcomes in Table 2. 

I asked the audience to indicate which one they 
thought was a more successful replication. The au- 
dience chose the second one, as would most journal 
editors, because of the “significant p- value.” In 
fact, the first replication has almost exactly the 
same proportion of heart attacks in the two groups 
as the original study and is thus a very close repli- 
cation of that result. The second replication has 


Table! 

Attempted replciations for professor B 


n 

Number of successes 

One-tailed p-value 

10 

4 

0.22 

15 

6 

0.15 

17 

6 

0.23 

25 

8 

0.17 

30 

10 

0.20 

40 

13 

0.18 

18 

7 

0!4 

10 

5 

0.08 

15 

5 

0.31 

20 

7 

0.21 


Table 2 

, Hypothetical replications of the aspirin / heart 
attack study 

Replication #1 

Replication #2 

Heart attack 

Heart attack 

Yes No 

Yes No 


Aspirin 11 1156 20 2314 

Placebo 19 1090 48 2170 

- - - - « n niv 
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target, rather than being forced to make a choice 
from a small discrete set of possibilities. Various 
types of target material have been used, including 
pictures, short segments of movies on video tapes, 
actual locations and small objects. 

Despite the more complex target material, the 
statistical methods used to analyze these experi- 
ments are similar to those for forced-choice experi- 
ments. A typical experiment proceeds as follows. 
Before conducting any trials, a large pool of poten- 
tial targets is assembled, usually in packets of four. 
Similarity of targets within a packet is kept to a 
minimum, for reasons made clear below. At the 
start of an experimental session, after the subject is 
sequestered in an isolated room, a target is selected 
at random from the pool. A sender is placed in 
another room with the target. The subject is asked 
to provide a verbal or written description of what 
he or she thinks is in the target, knowing only that 
it is a photograph, an object, etc. 

After the subject’s description has been recorded 
and secured against the potential for later alter- 
ation, a judge (who may or may not be the subject) 
is given a copy of the subject’s description and the 
four possible targets that were in the packet with 
the correct target. A properly conducted experi- 
ment either uses video tapes or has two identical 
sets of target material and uses the duplicate set 
for this part of the process, to ensure that clues 
such as fingerprints don’t give away the answer. 
Based on the subject’s description, and of course on 
a blind basis, the judge is asked to either rank the 
four choices from most to least likely to have been 
the target, or to select the one from the four that 
seems to best match the subject’s description. If 
ranks are used, the statistical analysis proceeds by 
summing the ranks over a series of trials and 
comparing the sum to what would be expected by 
chance. If the selection method is used, a “direct 
hit” occurs if the correct target is chosen, and the 
number of direct hits over a series of trials is 
compared to the number expected in a binomial 
experiment with p = 0.25. 

Note that the subjects’ responses cannot be con- 
sidered to be “random” in any sense, so probability 
assessments are based on the random selection of 
the target and decoys. In a correctly designed ex- 
periment, the probability of a direct hit by chance 
is 0.25 on each trial, regardless of the response, and 
the trials are independent. These and other issues 
related to analyzing free-response experiments are 
discussed by Utts (1991). 

4.2 The Psi Ganzfeld Experiments 
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isolation technique originally developed by Gestalt 
psychologists for other purposes. Evidence from 
spontaneous case studies and experimental work 
had led parapsychologists to a model proposing that 
psychic functioning may be masked by sensory in- 
put and by inattention to internal states (Honorton, 
1977). The ganzfeld procedure was specifically de- 
signed to test whether or not reduction of external 
“noise” would enhance psi performance. 

In these experiments, the subject is placed in a 
comfortable reclining chair in an acoustically 
shielded room. To create a mild form of sensory 
deprivation, the subject wears headphones through 
which white noise is played, and stares into a 
constant field of red light. This is achieved by 
taping halved translucent ping-pong balls over the 
eyes and then illuminating the room with red light. 
In the psi ganzfeld experiments, the subject speaks 
into a microphone and attempts to describe the 
target material being observed by the sender in a 
distant room. 

At the 1982 Annual Meeting of the Parapsycho- 
logical Association, a debate took place over the 
degree to which the results of the psi ganzfeld 
experiments constituted evidence of psi abilities. 
Psychologist and critic Ray Hyman and parapsy- 
chologist Charles Honorton each analyzed the re- 
sults of all known psi ganzfeld experiments to date, 
and they reached strikingly different conclusions 
(Honorton, 1985b; Hyman, 1985b). The debate con- 
tinued with the publication of their arguments in 
separate articles in the March 1985 issue of the 
Journal of Parapsychology. Finally, in the Decem- 
ber 1986 issue of the Journal of Parapsychology, 
Hyman and Honorton (1986) wrote a joint article 
in which they highlighted their agreements and 
disagreements and outlined detailed criteria for 
future experiments. That same issue contained 
commentaries on the debate by 10 other authors. 

The data base analyzed by Hyman and Honorton 
(1986) consisted of results taken from 34 reports 
written by a total of 47 authors. Honorton counted 
42 separate experiments described in the reports, of 
which 28 reported enough information to determine 
the number of direct hits achieved. Twenty three of 
the studies (55%) were classified by Honorton as 
having achieved statistical significance at 0.05. 

4.3 The Vote-Counting Debate 

Vote-counting is the term commonly used for the 
technique of drawing inferences about an experi- 
mental effect by counting the number of significant 
versus nonsignificant studies of the effect. Hedges 
and Olkin (1985) give a detailed analysis of the 
i v> rl rt/>l in nf ttiic mpf Lnrl showing that it is more 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 







Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 

CPYRGHT 


REPLICATION IN PARAPSYCHOLOGY 


ization, multiple tests used without adjusting the 
significance level (thus inflating the significance 
level from the nominal 6%) and failure to use a 
duplicate set of targets for the judging process (thus 
allowing possible clues such as fingerprints). Using 
cluster and factor analyses, the 12 binary flaw 
variables were combined into three new variables, 
which Hyman named General Security, Statistics 
and Controls. 

Several analyses were then conducted. The one 
reported with the most detail is a factor analysis 
utilizing 17 variables for each of 86 studies. Four 
factors emerged from the' analysis. From these, 
Hyman concluded that security had increased over 
the years, that the significance level tended to be 
inflated the most for the most complex studies and 
that both effect size and level of significance were 
correlated with the existence of flaws. 

Following his factor analysis, Hyman picked the 
three flaws that seemed to be most highly corre- 
lated with success, which were inadequate atten- 
tion to both randomization and documentation and 
the potential for ordinary communication between 
the sender and receiver. A regression equation was 
then computed using each of the three flaws as 
dummy variables, and the effect size for the experi- 
ment as the dependent variable. From this equa- 
tion, Hyman concluded that a study without these 
three flaws would be predicted to have a hit rate of 
27%. He concluded that this is “well within the 
statistical neighborhood of the 25% chance rate” 
(1985b, page 37), and thus “the ganzfeld psi data 
base, despite initial impressions, is inadequate ei- 
ther to support the contention of a repeatable study 
or to demonstrate the reality of psi” (page 38). 

Honorton discounted both Hyman’s flaw classifi- 
cation and his analysis. He did not deny that flaws 
existed, but he objected that Hyman’s analysis was 
faulty and impossible to interpret. Honorton asked 
psychometrician David Saunders to write an Ap- 
pendix to his article, evaluating Hyman’s analysis. 
Saunders first criticized Hyman’s use of a factor 
analysis with 17 variables (many of which were 
dichotomous) and only 36 cases and concluded that 
“the entire analysis is meaningless” (Saunders, 
1985, page 87). He then noted that Hyman’s choice 
of the three flaws to include in his regression anal- 
ysis constituted a clear case of multiple analysis, 
since there were 84 possible sets of three that could 
have been selected (out of nine potential flaws), and 
Hyman chose the set most highly correlated with 
effect size. Again, Saunders concluded that “any 
interpretation drawn from (the regression analysis] 
must be regarded as meaningless” (1985, page 88). 
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Hyman in his capacity as Chair of the National 
Academy of Sciences’ Subcommittee on Parapsy- 
chology. Using Hyman’s flaw classifications and a 
multivariate analysis, Harris and Rosenthal con- 
cluded that “Our analysis of the effects of flaws on 
study outcome lends no support to the hypothesis 
that ganzfeld research results are a significant 
function of the set of flaw variables” (1988b, 
page 3). 

Hyman and Honorton were in the process of 
preparing papers for a second round of debate when 
they were invited to lunch together at the 1986 
Meeting of the Parapsychological Association. They 
discovered that they were in general agreement on 
several major issues, and they decided to coauthor 
a “Joint Communique” (Hyman and Honorton, 
1986). It is clear from their paper that they both 
thought it was more important to set the stage for 
future experimentation than to continue the techni- 
cal arguments over the current data base. In the 
abstract to their paper, they wrote: 

We agree that there is an overall significant 
effect in this data base that cannot reasonably 
be explained by selective reporting or multiple 
analysis. We continue to differ over the degree 
to which the effect constitutes evidence for psi, 
but we agree that the final verdict awaits the 
outcome of future experiments conducted by a 
broader range of investigators and according to 
more stringent standards [page 351]. 

The paper then outlined what these standards 
should be. They included controls against any kind 
of sensory leakage, thorough testing and documen- 
tation of randomization methods used, better re- 
porting of judging and feedback protocols, control 
for multiple analyses and advance specification of 
number of trials and type of experiment. Indeed, 
any area of research could benefit from such a 
careful list of procedural recommendations. 

4.5 Rosenthal’s Meta-Analysis 

The same issue of the Journal of Parapsychology 
in which the Joint Communique appeared also car- 
ried commentaries on the debate by 10 separate 
authors. In his commentary, psychologist Robert 
Rosenthal, one of the pioneers of meta-analysis in 
psychology, summarized the aspects of Hyman’s 
and Honorton’s work that would typically be in- 
cluded in a meta-analysis (Rosenthal, 1986). It is 
worth reviewing Rosenthal’s results so that they 
can be used as a basis of comparison for the more 
recent psi ganzfeld studies reported in Section 5. 
Rosenthal, like Hyman and Honorton, focused 
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likely to be selected by the computer’s random 
number generator than any of the others in the. set. 
The selection of the target by the -computer is the 
only source of randomness in these experiments. 
This is an important point, and one that is often 
misunderstood. (See Utts, 1991, for elucidation.) 

Eighty of the targets were “dynamic,” consisting 
of scenes .from movies, documentaries, apd cartoons; 
80 were “static,” consisting of photographs, art 
prints ahd advertisements. The four targets within 
each set were all of the same type. Earlier studies 
indicated that dynamic targets were more likely to 
produce successful results, and . one of the goals of 
the new. experiments was to test that theory . 

The randomization procedure used to select the 
target and the order of presentation forjudging was 
thoroughly tested before and during the experi- 
ments. A detailed description is given by Honorton 
et al. (1990, pages 118-120). 

Three of the 11 series were pilot series, five were 
formal series with novice receivers, and three were 
formal series with experienced receivers. The last 
series with experienced receivers was the only one 
that did not use the 160 targets. Instead, it used 
only one set of four dynamic targets in which one 
target had previously received several first place 
ranks and one had never received .a first place 
rank. The receivers, none of whom had had prior 
exposure to that target pack, were not aware that 
only one target pack was being used. They each 
contributed one session only to the series. This will 
be called the “special series” in what follows. 

Except for two of the pilot series, numbers of 
trials Were planned in advance for each series. 
Unfortunately, three of the formal series were not 
yet completed when the funding ran out, including 
the special series, and one pilot study with advance 
planning was terminated early when the experi- 
menter relocated. There were no unreported trials 
during the 6-year period under review, so there was 
no ‘‘file drawer.” 

Overall, there were 183 Rs who contributed only 
one trial and 58 who contributed more than one, for 
a total of 241 participants and 355 trials. Only 23 
Rs had previously participated in ganzfeld experi- 
ments, and 194 Rs (81%) had never participated in 
any parapsychological research. 

5.2 Results 

While acknowledging that no probabilistic con- 
clusions can be drawn from qualitative data, Hon- 
orton et al. (1990) included several examples of 
session excerpts that Rs identified as providing the 
basis for their target rating. To give a flavor for the 
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rank, the first example is reproduced here. The 
target was a painting by Salvador Dali called 
“Christ Crucified.” Hie correct target received a 
first place r ank . The part of the mentation R used 
to make this assessment read: 

. . . I think of guides, like spirit guides, leading 
me and I come into a court with a king. It’s 
quiet.... It's like heaven. The king is some- 
thing like Jesus. Woman. Now I’m just sort of 
summersaulting through heaven.... 
Brooding . . . . Aztecs, the Sun God .... High 
priest . . . .Fear .... Graves. Woman. 
Prayer .... Funeral < . . . Dark. 

Death .... Souls Ten Commandments. 

Moses . . ... [Honorton et al., 1990]. 

Over all 11 series, there were 122 direct hits in 
the 355 trials, for a hit rate of 34.4% (exact bino- 
mial p-value = 0.00005) when 25% were expected 
by chance. Cohen’s h is 0.20, and a 95% confidence 
interval for the overall hit rate is from 0.30 to 0.39. 
This calculation assumes, of course, that the proba- 
bility of a direct hit is constant and independent 
across trials, an assumption -that may be question- 
able except under the null hypothesis of no psi 
abilities. 

Honorton et al. .(1990) also calculated effect sizes 
for each of the 11 series and each of the eight 
experimenters. All but one of the series (the first 
novice series) had positive effect sizes, as did all of 
the experimenters. 

The special series with experienced Rs had an 
exceptionally high effect size with h = 0.81, corre- 
sponding to 16 direct hits out of 25 trials (64%), but 
the remaining series and the experimenters had 
relatively homogeneous effect sizes given the 
amount of variability expected by chance. If the 
special series is removed, the overall hit rate is 
32.1%, h - 0.16. Thus, the positive effects are not 
due to just one series or one experimenter. 

Of the 218 trials contributed by novices, 71 were 
direct hits (32.5%, h = 0.17), compared with 51 
hits in the 137 trials by those with prior ganzfeld 
experience (37%, h = 0.26). The hit rates and effect 
sizes Were 31% (h — 0.14) for the combined pilot 
series, 32.5% (h = 0.17) for the combined formal 
novice series, and 41.5% ( h = 0.35) for the com- 
bined experienced series. The last figure drops to 
31.6% if the outlier series is removed. Finally, 
without the outlier series the hit rate for the com- 
bined series where all of the planned trials were 
completed was 31.2% ( h = 0.14), while it was 35% 
(h — 0.22) for the combined series that were termi- 
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scores from zero for the lowest quality, to eight for 
the highest. They included features such as ade- 
quate randomization, preplanned analysis and au- 
tomated recording of the results. The correlation 
between study quality and effect size was 0.081, 
indicating a slight tendency for “higher quality 
studies to be more successful, contrary to claims by 
critics that the opposite would be true. There was 
a clear relationship between quality and year of 
publication, presumably because over the years 
experimenters in parapsychology have responded 
to suggestions from critics for improving their 
methodology. 

File Drawer. Following Rosenthal (1984), the 
authors calculated the “fail-safe N" indicating the 
number of unreported studies that would have to be 
sitting in file drawers in order to negate the signifi- 
cant effect. They found N = 14,268, or a ratio of 46 
unreported studies for each one reported. They also 
followed a suggestion by Dawes, Landman and 
Williams (1984) and computed the mean z for all 
studies with z >1.65. If such studies were a ran- 
dom sample from the upper 5% tail of a N( 0, 1) 
distribution, the mean z would be 2.06. In this case 
it was 3.61. They concluded that selective reporting 
could not explain these results. 

Comparisons. Four variables were identified 
that appeared to have a systematic relationship to 
study outcome. The first was that the 25 studies 
using subjects selected on the basis of good past 
performance were more successful, than the 223 
using unselected subjects, with mean effect sizes of 
0.051 and 0.008, respectively. Second, the 97 stud- 
ies testing subjects individually were more success- 
ful than the 105 studies that used group testing; 
mean effect sizes were 0.021 and 0.004, respec- 
tively. Timing of feedback was the third moderat- 
ing variable, but information was only available for 
104 studies. The 15 studies that never told the 
subjects what the targets were had a mean effect 
size of -0.001. Feedback after each trial produced 
the best results, the mean ES for the 47 studies 
was 0.035. Feedback after each set of trials re- 
sulted in mean ES of 0.023 (21 studies), while 
delayed feedback (also 21 studies) yielded a mean 
ES of only 0.009. There is a clear ordering; as the 
gap between time of feedback and time of the 
actual guesses decreased, effect sizes increased. 

The fourth variable was the time interval be- 
tween the subject’s guess and the actual target 
selection, available for 144 studies. The best results 
were for the 31 studies that generated targets less 
than a second after the guess (mean ES = 0.045), 
while the worst were for the seven studies that 
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trend, decreasing in order as the time interval 
increased from minutes to hours to days to weeks to 
months. 

6.2 Attempts to Influence Random Physical 
Systems 

Radin and Nelson (1989) examined studies de- 
signed to test the hypothesis that “The statistical 
output of an electronic RNG [random number gen- 
erator] is correlated with observer intention in ac- 
cordance with prespecified instructions” (page 
1502). These experiments typically involve RNGs 
based on radioactive decay, electronic noise or pseu- 
dorandom number sequences seeded with true ran- 
dom sources. Usually the subject is instructed to 
try to influence the results of a string of binary 
trials by mental intention alone. A typical protocol 
would ask a subject to press a button (thus starting 
the collection of a fixed-length sequence of bits), 
and then try to influence the random source to 
produce more zeroes or more ones. A run might 
consist of three successive button presses, one each 
in which the desired result was more zeroes or 
more ones, and one as a control with no conscious 
intention. A z score would then be computed for 
each button press. 

The 832 studies in the analysis were conducted 
from 1959 to 1987 and included 235 “control” stud- 
ies, in which the output of the RNGs were recorded 
but there was no conscious intention involved. 
These were usually conducted before and during 
the experimental series, as tests of the RNGs. 

Results. The effect size measure used was again 
z/'/n, where z was positive if more bits of the 
specified type were achieved. The mean effect size 
for control studies was not significantly different 
from zero (-1.0 x 10 -6 ). The mean effect size 
for the experimental studies was also very small, 

3.2 x 10 ~ A , but it was significantly higher than the 
mean ES for the control studies ( z = 4.1). 

Quality. Sixteen quality measures were defined 
and assigned to each study, under the four general 
categories of procedures, statistics, data and the 
RNG device. A score of 16 reflected the highest 
quality. The authors regressed mean effect size on 
mean quality for each investigator and found a 
slope of 2.5 x 10 _s with standard error of 3.2 x 
10 -6 , indicating little relationship between quality 
and outcome. They also calculated a weighted mean 
effect size, using quality scores as weights, and 
found that it was very similar to the unweighted 
mean ES. They concluded that “differences 
in methodological quality are not significant 
predictors of effect size” (page 1507). 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 






Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 

CPYRGHT 


REPLICATION IN PARAPSYCHOLOGY 


The correlation between extroversion Gcores and 
ganzfeld rating scores was r = 0.18, with a 95% 
confidence interval from 0.05 to 0.30. This is con- 
sistent with the mean correlation of r'«* 0.20 for 
free-response experiments, determined from the 
meta-analysis. These correlations indicate that ex- 
troverted subjects can produce higher scores in 
free-response ESP tests. 

7. CONCLUSIONS 

Parapsychologists often make a -distinction be- 
tween “proof-oriented research'’ and ' '“process- 
oriented research." 'The. former is typically con- 
ducted to test the hypothesis that>psi abilities exist, 
while the latter is designed to answer questions 
about how psychic functioning works. Proof- 
oriented research has dominated the literature 
in parapsychology. Unfortunately, many of the 
studies used small samples and would ; thus be 
nonsignificant even if a moderate-sized effect 
exists. 

The recent focus on meta-analysis in parapsy- 
chology has revealed that there are small but 
consistently nonzero effects across studies, experi- 
menters and laboratories. The sizes of the effects in 
forced-choice studies appear to be comparable to 
those reported in some medical studies that had 
been heralded as breakthroughs. (See Section 5; 
also Honorton and Ferrari, 1989, page 301.) Free- 
response studies show effect sizes of far greater 
magnitude. . .. . 

A promising direction for future process-oriented 
research is to examine the causes of individual 
differences in psychic functioning. The ESP/ex- 
troversion meta-analysis is a step in that direction. 

In keeping with the idea of individual differ- 
ences, Bayes and empirical Bayes methods would 
appear to make more sense than the classical infer- 
ence methods commonly used, since they would 
allow individual abilities and beliefs to be modeled. 
Jeffreys (1990) reported a Bayesian analysis of some 
of the RNG experiments and showed that conclu- 
sions were closely tied to prior beliefs even though 
hundreds of thousands of trials were available. 

It may be that the nonzero effects observed in the 
meta-analyses can be explained by something other 
than ESP, such as shortcomings in our understand- 
ing of randomness and independence. Nonetheless, 
there is an anomaly that needs an explanation. As 
I have argued elsewhere (Utts, 1987), research in 
parapsychology should receive more support from 
the scientific community. If ESP does not exist, 
there is little to be lost by erring in the direction of 
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much to be gained by discovering how to enhance 
and apply these abilities to important world 
problems. 
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1. INTRODUCTION 

There are many fascinating issues discussed in 
this paper. Several concern parapsychology itself 
and the interpretation of statistical methodology 
therein. We are not experts in parapsychology, and 
so have only one comment concerning such mat- 
ters: In Section 3 we briefly discuss the need to 
switch from P-values to Bayes factors in discussing 
evidence concerning parapsychology. 

A more general issue raised in the paper is that 
of replication. It is quite illuminating to consider 
the issue of replication from a Bayesian perspec- 
tive, and this is done in Section 2 of our discussion. 

2. REPLICATION 

Many insightful observations concerning replica- 
tion are given in the article, and these spurred us 
to determine if they could be quantified within 
Bayesian reasoning. Quantification requires clear 
delineation of the possible purposes of replication, 
and at least two are obvious. The first is simple 
reduction of random error, achieved by obtaining 
more observations from the replication. The second 
purpose is to search for possible bias in the original 
experiment: We use “bias” in a loose sense here, to 
refer to any of the huge number of ways in which 
the effects being measured by the experiment can 
differ from the actual effects of interest. Thus a 
clinical trial without a placebo can suffer a placebo 
“bias"; a survey can suffer a “bias” due to the 
sampling frame being unrepresentative of the 
actual population; and possible sources of bias 
in parapsychological experiments have been 
extensively discussed. 

Replication to Reduce Random Error 

If the sole goal of replication of an experiment is 
to reduce random error, matters are very straight- 
forward. Reviewing the Bayesian way of studying 
this issue is, however, useful and will be done 
through the following simple example. 
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Example 1. Consider the example from Tversky 
and Kahnemann (1982), in which an experiment 
results in a standardized test statistic of z t = 2.46. 
(We will assume normality to keep computations 
trivial.) The question is: What is the highest value 
of z 2 in a second set of data that would be consid- 
ered a failure to replicate? Two possible precise 
versions of this question are: Question 1: What is 
the probability of observing z 2 for which the null 
hypothesis would be rejected in the replicated ex- 
periment? Question 2: What value of z 2 would 
leave one’s overall opinion about the null hypothe- 
sis unchanged? 

Consider the simple case where Z L - N(z t 1 0, 1) 
and (independently) Z 2 - N(z 2 \8, 1), where 6 is 
the mean and 1 is the standard deviation of the 
normal distribution. Note that we are considering 
the case in which no experimental bias is suspected 
and so the means for each experiment are assumed 
to be the same. - 

Suppose that it is desired to test H 0 : 0 < 0 versus 
flj'd > 0, and suppose that initial prior opinion 
about 0 Can be described by the noninformative 
prior x(0) *= T. We consider the one-sided testing 
problem with a constant prior in this section, be- 
cause it is known that then the posterior probabil- 
ity of H 0 , to be denoted by P(ff 0 jdata), equals the 
P-value, allowing us to avoid complications arising 
from differences between Bayesian and classical 
answers. 

After observing z x = 2.46, the posterior distribu- 
tion of Pis 

t(0\z 1 ) = A/(0]2.46,l). 

Question 1 then has the answer (using predictive 
Bayesian reasoning) 

P(rejecting at level a | z t ) 

r°° r°° l ... v .. . 


fCO [Co 1 

t i em J. w TFx 

I c a - 2.46 

= 1 ” *( 7T 


| do dz 2 


where 4 is the standard normal cdf and c a is the 
(one-sided) critical value corresponding to the level, 
a, of the test. For instance, if a = 0.05, then this 
probability equals 0.7178, demonstrating that there 
is a quite substantial probability that the second 
: * re ~ ;<• Minson: to be 
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A sensible candidate for the prior density x(/S) 
is the Cauchy (0, V) density 


, 03 ) = 


tV[1 + (0/V) 2 ] 


Flat-tailed densities, such as this, are well known 
to have the property that when discordant data is 
observed (e.g., when ( | y - x 2 1 is large), substan- 
tial mass shifts away from the prior center towards 
the likelihood center. It is easy to see that a normal 
prior for /3 can not have the desired behavior. 

Our first surprise in consideration of these priors 
was how small V needed to be chosen in order for 
PCffoly- *i) to be unaffected, by the. bias. For 
instance, even with V = 1.54/100 (recall that 1.54 
was the standard deviation of Y from the original 
experiment), computation yields P(H 0 1 y, x x ) = 
4.3 X 10 " 6 , compared with the P-value (and poste- 
rior probability from -the original experiment as- 
suming no bias) of 2.8 x 10" 7 . There is a clear 
lesson here; even very small suspicions of bias can 
drastically alter a small P-value. Note that replica- 
tion 1 is very consistent with the presence of no 
bias, and so the posterior distribution for the bias 
remains tightly concentrated near zero; for in- 
stance, the mean of the posterior for /3 is then 
7.2 x 10" 6 , and the standard deviation is 0.25. 

When we turned attention to replication 2, we 
found that it did not seriously change the prior 
perceptions of bias. Examination quickly revealed 
the reason; even the maximum likelihood estimate 
of the bias is no more than 1.4 standard deviations 
from zero, which is not enough to change strong 
prior beliefs. We, therefore, considered a third 
experiment, defined in Table 1. Transforming to 
approximate normality, as before, yields 

*3 ~ 1 3.48), 

with x 3 = 22.72 being the actual observation. The 
maximum likelihood estimate of bias is now 3.95 
standard deviations from zero, so there is potential 
for a substantial change in opinion about the bias. 

Sure enough, computation when V - 1.54/100 
yields that E[&\ y, x 3 ] = -4.9 with (posterior) 
standard deviation equal to 6.62, which is a dra- 
matic shift from prior opinion (that d is Cauchy (0, 


Table 1 

Frequency of heart attacks in replication 3 


AsDirin 


1.54/100)). The effect of this is to essentially ignore 
the original experiment in overall assessments of 
evidence. For instance, P(H 0 1 y, x s ) = 3.81 x 
10" u , which is very close to P(H 0 1 x 3 ) = 3.29 x 
10” u . Note that, if 0 were set equal to zero, the 
overall posterior probability of H 0 (and P-value) 
would be 2.62 x 10" 1S . 

Thus Bayesian reasoning can reproduce the intu- 
ition that replication which indicates bias can cast 
considerable doubt on the original experiment, 
while replication which provides no evidence of 
bias leaves evidence from the original experiment 
intact. Such behavior seems only obtainable, how- 
ever, with flat-tailed priors for bias (such as the 
Caucny) that are very concentrated (in comparison 
with the experimental standard deviation) near 


3. P-VALUES OR BAYES FACTORS? 

Parapsychology experiments usually consider 
testing of H 0 : No parapsychological effect exists. 
Such null hypotheses are often realistically repre- 
sented as point nulls (see Berger and Delampady, 
1987, for the reason that care must be taken in 
such representation), in which case it is known that 
there is a large difference between P-values and 
posterior probabilities (see Berger and Delampady, 
1987, for review). The article by Jefferys (1990) 
dramatically illustrates this, showing that a very 
small P-value can actually correspond to evidence 
for H 0 when considered from a Bayesian perspec- 
tive. (This is very related to the famous “Jeffreys” 
paradox.) The argument in favor of the Bayesian 
approach here is very strong, since it can be shown 
that the conflict holds for virtually any sensible 
prior distribution; a Bayesian answer can be wrong 
if the prior information turns out to be inaccurate, 
but a Bayesian answer that holds for all sensible 
priors is unassailable. 

Since P-values simply cannot be viewed as mean- 
ingful in these situations, we found it of interest to 
reconsider the example in Section 5 from a Bayes 
factor perspective. We considered only analysis of 
the overall totals, that is, x = 122 successes out of 
n = 355 trials. Assuming a simple Bernoulli trial 
model with success probability 8, the goal is to test 
H 0 :8 = 1/4 versus H x :8 * 1/4. 

To determine the Bayes factor here, one must 
specify g(8), the conditional prior density on H x . 
Consider choosing g to be uniform and symmetric, 
that is, 
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bate. This debate is also a good example of how 
statistical criticism can be part of the scientific 
process and lead to better experiments and, in gen- 
eral, better science. 

The remainder of the paper addresses technical 
issues of meta-analysis, drawing upon recent re- 
search in parapsychology for an in-depth applica- 
tion. Through a series Of examples, the author 
presents a convincing argument that power issues 
cannot be overlooked in successive replications and 
that comparison of effect sizes provides a richer 
alternative to the dichotomous measure inherent in 
the use of p-values. This is particularly relevant 
when the potential effect size is small and re- 
sources are limited, as seems to be the case for psi 
studies. 

The concluding section briefly mentions Bayesian 
techniques. As noted by the author, Bayes (or em- 
pirical Bayes) methodology seems to make sense for 
research in parapsychology . This discussion exam- 
ines possible Bayesian approaches to meta-analysis 
in this field. 

BAYES MODELS FOR PARAPSYCHOLOGY 

The notion of repeatability maps well into the 
Bayesian set-up in which experiments, viewed as a 
random sample from some superpopulation of ex- 
periments, are assumed to be exchangeable. When 
subjects can also be viewed as an approximately 
random sample from some population, it is appro- 
priate to pool them across experiments. Otherwise, 
analyses that partially pool information according 
to experimental heterogeneity need to be consid- 
ered. Empirical and hierarchical Bayes methods 
offer a flexible modeling framework for such analy- 
ses, relying on empirical or subjective sources to 
determine the degree of pooling. These richer meth- 
ods can be particularly useful to meta-analysis of 
experiments in parapsychology conducted under 
potentially diverse conditions. 

For the recent ganzfeld series, assuming them 
to be independent binomially distributed as dis- 
cussed in Section 5, the data can be summed 
(pooled) across series to estimate a common hit 
rate. Honorton et al. (1990) assessed the homogene- 
ity of effects across the 11 series using a chi-square 
test that compares individual effect sizes to 
the weighted mean effect. The chi-square statistic 
xl 0 = 16.25, not statistically significant (p = 
0.093), largely reflects the contribution of the last 
“special” series (contributes 9.2 units to the x?o 
value), and to a lesser extent the novice series with 
a negative effect (contributes 2.5 units). The outlier 


effects for this data (this result is reported in Sec- 
tion 6). For the remaining 10 series, the chi-square 
value x* = 7.01 strongly favors homogeneity, al- 
though more than one-third of its value is due to 
the novice series (number 4 in Table 1). This pat- 
tern points to the potential usefulness of a richer 
model to accommodate series that may be distinct 
from the others. For the earlier ganzfeld data ana- 
lyzed by Honorton (1985b), the appeal of a Bayes or 
other model that recognizes the heterogeneity 
across studies is clear cut: X 23 = 66 - 6 * P ~ 
where only those studies with common chance hit 
rate have been included (see Table 2). 

Historic reliance on voting-count approaches to 
determine the presence of psi effects makes it natu- 
ral to consider Bayes models that focus on the 
ensemble of experimental effects from parapsycho- 
logical studies, rather than individual estimates. 
Recent work in parapsychology that compares ef- 
fect across studies, rather than estimating 

separate study effects, reinforces the need to exam- 
ine this type of model. Louis (1984) develops Bayes 
and empirical Bayes methods for problems that 
consider the ensemble of parameter values to be 
the primary goal, for example, multiple compar- 
isons. For the simple compound normal model, 
Y ( - N(d it 1), 0, - N(p, r 2 ), the standard Bayes 
estimates (posterior means) 

t 2 

ef = p + D(Y i -p) and 2=^77 

where the 0, represent experimental effects of in- 
terest, are modified approximately to 

0/ « ft + y/D ( Y- — n) 

when an ensemble loss function is assumed. The 
new estimates adjust the shrinkage factor D so 
that their sample mean and variance match the 
posterior expectation and variance of the 0 ’s. Simi- 
lar results are obtained when the model is gener- 


Table 1 

Recent ganzfeld series 


Series type 

N Trials 

Hit rate 

Yi 

<*i 

Pilot 

22 

0.36 

-0.58 

0.44 

Pilot 

9 

0.33 

-0.71 

0.71 

Pilot 

36 

0.28 

-0.94 

0.37 

Novice 

50 

0.24 

-1.15 

0.33 

Novice 

50 

0.36 

-0.58 

0.30 

Novice 

50 

0.30 

-0.85 

0.31 

Novice 

50 

0.36 

-0.58 

0.30 

Novice 

6 

0.67 

0.71 

0.87 

Experienced 

7 

0.43 

-0.28 

0.76 

Experienced 

50 

0.30 

—0.85 

0.31 

n jo 

Experienced 

25 

0.64 

0.58 
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should not vary as the prior for r 2 is varied. Other- 


maximum likelihood estimation that modify the 
sampling error distribution to yield estimates that 
are “robust” against outlying observations. 

Like its maximum likelihood counterparts, in ad- 
dition to the robust effect estimates 0 *, the Bayes 
model provides (posterior) scale estimates y*. These 
can be interpreted as the weight given to the data 
for e ach 0 f in the analysis and are useful to diag- 
nosing which model components (series or studies) 
are unusual and how they influence the shrinkage. 
When more complex groupings among the 0* are 
suspected, for example, bimodal distribution of 
studies from different sites or experimenters, other 
mixture specifications can be used to further relax 
the shrinkage toward a common value. 

For the 11 ganzfeld series, the last “outlier” 
series, quite distinct from the others (hit rate = 
0.64), is moderately precise (N = 25). Omitting it 
from the analysis causes the overall hit rate to drop 
from 0.344 to 0.321. The scale mixture model is a 
compromise between these two values (on the logit 
scale), discounting the influence of series 11 on the 
estimated posterior common hit rate used for 
shrinkage. The scale factor 7^, an indication of 
how separate 0 U is from the other parameters, also 
causes 6* t to be shrunk less toward the common hit 
rate than other, more homogeneous 0„ giving more 
weight to individual information for that series (see 
West, 1985). The heterogeneity of the earlier 
ganzfeld data is more pronounced, and studies are 
taken from a variety of sources over time. For these 
data, the y* can be used to explore atypical studies 
(e.g., study 6, with hit rate = 0.90, contributes more 
than 25% to the xis value for homogeneity) and 
groupings . among effects, as well as protect the 
analysis from misspecification of second-stage 
normality. 

Variation among ganzfeld series or studies and 
the degree to which pooling or shrinking is appro- 
priate can be investigated further by considering a 
range of priors for r 2 . If the marginal likelihood of 
t 2 dominates the prior specification, then results 


wise, it is important to identify the degree to which 
subjective information about interexperimental 
variability influences the conclusions. This sen- 
sitivity analysis is a Bayesian enrichment of 
the simpler test of homogeneity directed toward 
determining whether or not complete pooling is 
appropriate. 

To assess how well heterogeneity among his- 
torical control groups is determined by the data. 
Dempster, Selwyn and Weeks (1983) propose three 
priors for r 2 in the logistic-normal model. The prior 
distributions range from strongly favoring individ- 
ual estimates, p(r 2 )dr °c t" 1 , to the uniform refer- 
ence prior p(t 2 )cLt « r' 2 , flat on the log r scale, to 
strongly favoring complete pooling, p(r 2 )dr « r~ a 
(the latter forcing complete pooling for the com- 
pound normal model; see Morris, 1983). For their 
two examples, the results (estimates of linear treat- 
ment effects) are largely insensitive to variation in 
the prior distribution, but the number of studies in 
each example was large (70 and 19 studies avail- 
able for pooling). For the 11 ganzfeld series, t 2 may 
be less well determined by the data. The posterior 
estimate of r 2 and its sensitivity to p(r 2 )<fr will 
also depend on whether individual scale parame- 
ters are incorporated into the model. Discounting 
the influence of the last series will both shift the 
marginal likelihood toward smaller values of r 
and concentrate it more in that region. 

The issue of objective assessment of experiment 
results is one that extends well beyond the field of 
parapsychology, and this paper provides insight into 
issues surrounding the analysis and interpretation 
of small effects from related studies. Bayes meth- 
ods can contribute to such meta-analyses in two 
ways. They permit experimental and subjective evi- 
dence to be formally combined to determine the 
presence or absence of effects that are not clear cut 
or controversial (e.g., psi abilities). They can also 
help uncover sources and degree of uncertainty in 
the scientific conclusions. 
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advances methodically and objectively through the 
accumulation of knowledge (or the rejection of false 
knowledge) derived from the implementation of the 
scientific method. But, as we will see, there is more 
to the acceptance of new scientific discoveries than 
the systematic accumulation and evaluation of 
facts. The recognition that there is a social process 
involved with the acceptance or rejection of scien- 
tific knowledge has been the subject of study of 
sociologists for some time. The scientific commu- 
nity’s rejection of the existence of paranormal phe- 
nomena is an excellent case study of this process 
(Allison, 1979; Collins and Pinch, 1979). 

Implicit in Professor Utts’ presentation and 
paramount to the acceptance of parapsychology as 
a legitimate science are the description and docu- 
mentation of the professionalization of the field of 
parapsychology. It is true that many researchers in 
the field have university appointments; there are 
organized professional societies , for the advance- 
ment of parapsychology; there are journals with 
rigorous standards for published research; the field 
has received funding from federal agencies; and 
parapsychology has received recognition from other 
professional societies, such as the IMS and the 
American Association for the Advancement of Sci- 
ence (Collins and Pinch, 1979). Nevertheless, most 
readers of Statistical Science would agree that 
parapsychology is not accepted as part of orthodox 
science and is considered by most of the scientific 
community to be on the margins of science, at best 
(Allison, 1979; Collins and Pinch, 1979). Why is 
this the case? Professor Utts believes that it is 
because people have not examined the data. She 
states that “Strong beliefs tend to be resistant to 
change even in the face of data, and many people, 
scientists included, seem to have made up their 
minds on the question without examining any em- 
pirical data at all.” 

The history of science is replete with examples of 
resistance by the established scientific community 
to new discoveries. A challenging problem for sci- 
ence is to understand the process by which a new 
theory or discovery becomes accepted by the com- 
munity of scientists and, likewise, to characterize 
the nature of the resistance to new ideas. Barber 
(1961) suggests that there are many different 
sources of resistance to scientific discovery. In 1900, 
for example, Karl Pearson met resistance to his use 
of statistics in applications to biological problems, 
illustrating a source of resistance due to the use of 
a particular methodology. The Royal Society in- 
formed Pearson that future papers submitted to the 
Society for publication must keep the mathematics 


entific ideas, and the one referred to by Professor 
Utts above, is the prevailing substantive beliefs 
and theories held by scientists at any given time. 
Barber offers the opposition to Copernicus and his 
heliocentric theory and to Mendel’s theory of ge- 
netic inheritance as examples of how, because of 
preconceived ideas, theories and values, scientists 
are not as open-minded to new advances as one 
might think they should be. It was R. A. Fisher 
who said that each generation seems to have found 
in Mendel’s paper only what it expected to find and 
ignored what did not conform to its own expecta- 
tions (Fisher, 1936). 

Pearson’s response to the antimathematical prej- 
udice expressed by the Royal Society was to estab- 
lish with Galton’s support a new journal, 
Biometrika , to encourage the use of mathematics in 
biology. Galton (1901) wrote an article for the first 
issue of the journal, explaining the need for this 
new voice of “mutual encouragement and support” 
for mathematics in biology and saying that “a new 
science cannot depend on a welcome from the fol- 
lowers of the older ones, and [therefore] ... it is 
advisable to establish a special Journal for Biome- 
try.” Lavoisier understood the role of preconceived 
beliefs as a source of resistance when he wrote in 
1785, 

I do not expect my ideas to be adopted all at 
once. The human mind gets creased into a way 
of seeing things. Those who have envisaged 
nature according to a certain point of view 
during much of their career, rise only with 
difficulty to new ideas. (Barber, 1961.) 

I suspect that this paper by Professor Utts syn- 
thesizing the accumulation of research results sup- 
porting the existence of paranormal phenomena 
will continue to be received with skepticism by the 
orthodox scientific community “even after examin- 
ing the data.” In part, this resistance is due to the 
popular perception of the association between para- 
psychology and the occult (Allison, 1979) and due 
to the continued suspicion and documentation of 
fraud in parapsychology (Diaconis, 1978). An addi- 
tional and important source of resistance to the 
evidence presented by Professor Utts, however, is 
the iack of a model to explain the phenomena. 
Psychic phenomena are unexplainable by any cur- 
rent scientific theory and, furthermore, directly 
contradict the laws of physics. Acceptance of psi 
implies the rejection of a large body of accumulated 
evidence explaining the physical and biological 
world as we know it. Thus, even though the effect 
size for a relationship between aspirin and the 
— ~ ^ nf f o ^lrc ic f tivon times smaller 
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of a discipline it turns to meta-analysis to answer 
research questions or to resolve controversy (e.g., 
Greenhouse et al., 1990). 

One argument for combining information from 
different studies is that a more powerful result can 
be obtained than from a single study. This objective 
is implicit in the use of meta-analysis in parapsy- 
chology and is the force behind Professor Utts’ 
paper. The issue is that by combining many small 
studies consisting of small effects there is a gain in 
power to find an overall statistically significant 
effect. It is true that the meta-analyses reported by 
Professor Utts find extremely small p-values, but 
the estimate of the overall effect size is still small. 
As noted earlier, because of the small magnitude of 
the overall effect size, the possibility that other 
extraneous variables might account for the rela- 
tionship remains. 

Professor Utts, however, also illustrates the use 
of meta-analysis to investigate how studies differ 
and to characterize the influence of difficult covari- 
ates or moderating variables on the combined esti- 
mate of effect size. For example, she compares the 
mean effect size of studies where subjects were 
selected on the basis of good past performance to 
studies where the subjects were unselected, and she 
compares the mean effect size of studies with feed- 
back to studies without feedback. To me, this latter 
use of meta-analysis highlights the more valuable 
and important contribution of the methodology. 
Specifically, the value of quantitative methods for 


Comment 

Ray Hyman 


Utts concludes that “there is an anomaly that 
needs explanation.” She bases this conclusion on 
the ganzfeld experiments and four meta-analyses of 
parapsychological studies. She argues that both 
Honorton and Rosenthal have successfully refuted 
my critique of the ganzfeld experiments. The meta- 
analyses apparently show effects that cannot be 
explained away by unreported experiments nor 
over-analysis of the data. Furthermore, effect size 
does not correlate with the rated quality of the 
experiment. 
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research synthesis is in assessing the potential ef- 
fects of study characteristics and to quantify the 
sources of heterogeneity in a research domain, that 
is, to study systematically the effects of extraneous 
variables. Tom Chalmers and his group at Harvard 
have used meta-analysis in just this way not only 
to advance the understanding of the effectiveness of 
medical therapies but also to study the characteris- 
tics of good research in medicine, in particular, the 
randomized controlled clinical trial. (See Mosteller 
and Chalmers, 1991, for a review of this work.) 

Professor Utts should be congratulated for her 
courage in contributing her time and statistical 
expertise to a field struggling on the margins of 
science, and for her skill in synthesizing a large 
body of experimental literature. I have found her 
paper to be quite stimulating, raising many inter- 
esting issues about how science progresses or does 
not progress. 
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Neither time nor space is available to respond in 
detail to her argument. Instead, I will point to 
some of my concerns. I will do so by focusing on 
those parts of Utts’ discussion that involve me. 
Understandably, I disagree with her assertions that 
both 'Honorton and Rosenthal successfully refuted 
my criticisms of the ganzfeld experiments. 

Her treatment of both the ganzfeld debate and 
the National Research Council’s report suggests 
that Utts has relied on second-hand reports of the 
data. Some of her statements are simply inaccu- 
rate. Others suggest that she has not carefully read 
what my critics and I have written. This remote- 
ness from the actual experiments and details of the 
nv>/nt*v>nrtfr WOU nOt*flQ IT fhr her oDtimistic 
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us. Harris and Rosenthal were commissioned by 
our evaluation subcommittee to write a paper on 
evaluation issues, especially those related to exper- 
imenter effects. On their own initiative, Harris and 
Rosenthal surveyed a number of data bases to illus- 
trate the application of methodological procedures 
such as meta-analysis. As one illustration, they 
included a meta-analysis of the subsample of 
ganzfeld experiments used by Honorton in his 
rebuttal to my critique. 

Because Harris and Rosenthal did not them- 
selves do a first-hand evaluation of the ganzfeld 
experiments, and because they used Honorton ? s rat- 
ings for their illustration, I did not refer to their 
analysis when I wrote my draft for the chapter on 
the paranormal. Rosenthal told me, in a letter, that 
he had arbitrarily used Honorton’s ratings rather 
than mine because they were the most recent avail- 
able. I assumed that Harris and Rosenthal were 
using Honorton’s sample and ratings to illustrate 
meta-analytic procedures. I did not believe they 
were making a substantive contribution to the 
debate. 

Only after the committee’s complete report was 
in the hands of the editors did someone become 
concerned that Harris and Rosenthal had come to a 
conclusion on the ganzfeld experiments different 
from the committee. Apparently one or more com- 
mittee members contacted Rosenthal and asked him 
to explain why he and Harris were dissenting. 

Because some committee members believed that 
we should deal with this apparent discrepancy, I 
contacted Rosenthal and pointed out if he had used 
my ratings with the very same analysis he had 
applied to Honorton’s ratings, he would have 
reached a conclusion opposite to what Harris and 
he had asserted. I did this, not to suggest my 
ratings were necessarily more trustworthy than 
Honorton’s, but to point out how fragile any conclu- 
sions were based on this small and limited sample. 
Indeed, the data were so lacking in robustness that 
the difference between my rating and Honorton’s 
rating of one investigator (Sargent) on one at- 
tribute (randomization) sufficed to reverse the con- 
clusions Harris and Rosenthal made about the 
correlation between quality and effect size. 

Harris and Rosenthal responded by adding a foot- 
note to their paper. In this footnote, they repor- 
ted an analysis using my ratings rather than 
Honorton’s. This analysis, they concluded, still sup- 
ported the null hypothesis of no correlation be- 
tween quality and effect size. They used 6 of my 12 
dichotomous ratings of flaws as predictors and the z 
score and effect size as criterion variables in both 
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lation between criterion variables and flaws of 
“only" 0.46. A true correlation of this magnitude 
would be impressive given the nature and split of 
the dichotomous variables. But, because it was not 
statistically significant, Harris and Rosenthal con- 
cluded that there was no relationship between 
quality and effect size. A canonical correlation on 
this sample of 28 nonindependent cases, of course, 
has virtually no chance of being significant, even if 
it were of much greater magnitude. 

What this amounts to is that the alleged contra- 
dictory conclusions of Harris and Rosenthal are 
based on a meta-analysis that supports Honorton’s 
position when Honorton’s ratings are used and 
supports my position when my ratings are used. 
Nothing substantive comes from this, and it is 
redundant with what Honorton and I have already 
published. Harris and Rosenthal’s footnote adds 
nothing because it supports the null hypothesis 
with a statistical test that has no power against a 
reasonably sized alternative. It is ironic that Utts, 
after emphasizing the importance of considering 
statistical power, places so much reliance on the 
outcome of a powerless test. 

(I should add that the recurrent charge that the 
NRC committee completely ignored Harris and 
Rosenthal’s conclusions is not strictly correct. I 
wrote a response to the Harris and Rosenthal paper 
that was included in the same supplementary 
volume that contains their commissioned paper.) 

Utts’ discussion of the ganzfeld debate, as I have 
indicated, also shows unfamiliarity with details. 
She cites my factor analysis and Saunders’ critique 
as if these somehow jeopardized the conclusions I 
drew. Again, the matter is too complex to discuss 
adequately in this forum. The “factor analysis” she 
is talking about is discussed in a few pages of my 
critique. I introduced it as a convenient way to 
summarize my conclusions, none of which depended 
on this analysis. I agree with what Saunders has to 
say about the limitations of factor analysis in this 
context. Unfortunately, Saunders bases his criti- 
cism on wrong assumptions about what I did and 
why I did it. His dismissal of the results as 
“meaningless” is based on mistaken algebra. I in- 
cluded as dummy variables five experimenters in 
the factor analysis. Because an experimenter can 
only appear on one variable, this necessarily forces 
the average intercorrelation among the experi- 
menter variables to be negative. Saunders falsely 
asserts that this negative correlation must be — 1. 
If he were correct, this would make the results 
meaningless. But he could be correct only if there 
were just two investigators and that each one ac- 
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Comment 

Robert L. Morris 

Experimental sciences by their nature have found 
it relatively easy to deal with .simple closed sys- 
tems. When they come to study more cbmplex, open 
systems, however, they have more difficulty in gen- 
erating testable models, must rely more on multi- 
variate .approaches, have, more diversity from 
experiment to experiment (and thus more difficulty 
in constructing replication attempts), have more 
noise in the data, and more difficulty in construct- 
ing a linkage between concept and measurement. 
Data gatherers and other researchers are more 
likely to be part of the system themselves. Exam- 
ples include ecology, economics, social psychology 
and parapsychology. Parapsychology can be re- 
garded as the study of apparent new means of 
communication, or transfer pf influence, between 
organism and environment. Any observer attempt- 
ing to decide whether or not such psychic communi- 
cation has taken place is one of several elements in 
a complex open system composed Of an indefinite 
number of interactive features. The system can be 
modeled, as has been done elsewhere (e.g., Morris, 
1986) such as to organise our understanding of how 
observers can be misled by themselves, or by delib- 
erate frauds. Parapsychologists designing experi- 
mental studies must take extreme care to ensure 
that the elements in the experimental system do 
not interact in unanticipated ways to produce arti- 
fact or encourage fraudulent procedures. When re- 
searchers follow up the findings of others, they 
must ensure that the new experimental system 
sufficiently resembles the earlier one, regarding its 
important components and their potential interac- 
tions, Specifying sufficient resemblance is more dif- 
ficult in complex and open systems, and in areas of 
research using novel methodologies. 

As a result, parapsychology end other such areas 
may well profit from the application of modern 
meta-analysis, and meta-analytic methods may in 
turn profit from being given a good stiff workout by 
controversial data bases, as suggested by Jessica 
Utts in her article. Parapsychology would appear to 
gain from meta-analytic techniques, in at least 
three important areas. 

First, in assessing the question of replication 
rate, the new focus on effect size and confidence 


Robert L. Morris occupies the Koestler Chair of 


intervals rather than arbitrarily chosen signifi- 
cance levels seems to indicate much greater consis- 
tency in the findings than has previously been 
claimed. 

Second, when one codes the individual studies for 
flaws and relates flaw abundance with effect size, 
there appears to be little correlation for all'but one 
data base. This contradicts the frequent assertion 
that parapsychological results disappear when 
methodology is tightened. Additional evidence on 
this point is the series of studies by Honorton and 
associates using an automated ganzfeld procedure, 
apparently better conducted than any of the previ- 
ous research, which nevertheless obtained an effect 
size very similar to that of the earlier more diverse 
database. 

Third, meta-analysis allows researchers to look 
at moderator variables, to build a clearer picture of 
the conditions that appear to produce the strongest 
effects. Research in any real scientific discipline 
must be cumulative, with later researchers build- 
ing on the work of those who preceded them. If our 
earlier successes and failures have meaning, they 
should help us obtain increasingly consistent, 
clearer results; If psychic ability exists and is suffi- 
ciently stable that it can be manifest in controlled 
experimental studies, then moderator variables 
should be present in groups of studies that would 
indicate conditions most favourable and least 
favourable to the production of large effect sizes. 
From the analyses presented by Utts, for instance, 
it seems evident that group studies tend to produce 
poor results and, however convenient it may be to 
conduct them, future researchers should apparently 
focus much more on individual testing. When doing 
ganzfeld studies, it appears best to work with dy- 
namic rather than static target material and with 
experienced participants rather than novices. If 
such results are valid, then future researchers who 
wish to get strong results now have a better idea of 
what procedures to select to increase the likelihood 
of so doing, what elements in the experimental 
system seem most relevant. The proportion of stud- 
ies obtaining positive results should therefore 
increase. 

However, the situation may be more complex 
than the somewhat ideal version painted above. As 
noted earlier, meta-analysis may learn from para- 
psychology as well as vice versa. Parapsychological 

data may well give meta-analytic techniques a goo 
• .3 1 1 *>rten crime' challenges. 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 




CPYRGHT 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


REPLICATION OF PARAPSYCHOLOGY 


misses estimated; perhaps Cohen’s h greatly un- 
derestimates effect size when very , low probability 
events (less t h a n 1 in 50 for heart attack in the 
placebo condition and less than 1 in a 100 for 
aspirin) are involved. I’m not a statistician and 
thus don’t know if there is a relevant literature on 
this point. 
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The above objections should not detract from the 
overall value of the Utts survey. The findings she 
reports will need to be replicated; but even as is, 
they , provide a challenge to some of the cherished 
arguments of counteradvocates, yet also Challenge 
serious researchers to use thesefindings effectively 
as guidelines for future studies. 


Comment 

Frederick Mosteller 

Dr. Utts’s discussion stimulates me to offer some 
comments that bear on her topic but do not, in the 
main, fall into an agree-disagree mode. My refer- 
ences refer to her bibliography. 

Let me recommend J. Edgar Coover’s work to 
statisticians who would like to read about a pretty 
sequence of experiments developed and executed 
well before Fisher’s book on experimental design 
appeared. Most of the standard kinds of ESP exper- 
iments (though not the ganzfeld) are carried out 
and reported in this 1917 book. Coover even began 
looking into the amount of information contained 
in cues such as whispers. He also worked at expos- 
ing mediums. I found the book most impressive. As 
Utts says in her article, the question of significance 
level was a puzzling one, and one we still cannot 
solve even though some fields seem to have stan- 
dardized on 0.05. 

When Feller’s comments on Stuart and Green- 
wood’s sampling experiments came out in the first 
edition of his book, 1 was surprised. Feller devotes 
a problem to the results of generating 25 symbols 
from the set a, b, c, d and e (page 45, first edition) 
using random numbers with 0 and 1 corresponding 
to a, 2 and 3 to b, etc. He asks the student to find 
out how often the 25 produce 5 of each symbol. He 
asks the student to check the results using random 
number tables. The answer seems to be about 1 
chance in 600. In a footnote Feller then says “They 
[random numbers] are occasionally extraordinarily 
obliging: c.f. J. A. Greenwood and E. E. Stuart, 
Review of Dr. Feller’s Critique, Journal of Para- 
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psychology, vol. 4 (1940), pp. 298-319, in particular 
p. 306.” The 25 symbols of 5 kinds, 5 of each, 
correspond to the cards in a parapsychology deck. 

The point of page 306 is that Greenwood and- 
Stuart on that page claim to have generated two 
random orders of such a deck using Tippett’s table 
of random numbers. Apparently Feller thought that 
it would have taken them a long time to do it. If 
one assumes that Feller’s way of generating a ran- 
dom shuffle -is required, then it would indeed be 
unreasonable to suppose that -the experiments could 
be -carried out quickly. -I wondered then whether 
Feller thought this was .the only way to produce a 
random order.to such a deck, of cards. If you happen 
to know -how to shuffle a deck, efficiently using 
random numbers, it -is hard -to -believe that others 
do not know. I decided :to test -it out and so I 
proposed to a class of 90 people in mathematical 
statistics that we find a way of using random num- 
bers to shuffle a deck of -cards. Although they were 
familiar with random numbers, they could not come 
up with a way of doing it, nor did anyone after class 
come in with a workable idea though several stu- 
dents made proposals. I concluded that inventing 
such a shuffling technique -was a hard problem and 
that maybe Feller just did not know how at the 
time of writing the footnote. My face-to-face at- 
tempts to verify this failed because his response 
was evasive. I also recall Feller speaking at a 
scientific, meeting where someone had co mpl a in ed 
about mistakes in published papers. He said essen- 
tially that we won’t have any Uterature if mistakes 


are disallowed and further claimed that he always 
had mistakes in his own papers, hard as he tried to 
avoid them. It was fun to hear him speak. 

Although I find Utts’s discussion of replication 
engaging as a problem in human perception, I do 
always feel that people should not be expected' to 
carry out difficult mathematical exercises in their 
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ited PRL for several days and was a subject in 
Series 101” (pages 184-135]. 

Honorton has also informed me (personal communi- 
cation, July 25, 1991) that several self-proclaimed 
skeptics have visited his laboratory and received 
demonstrations of the autoganzfeld procedure and 
that no one expressed any concern with the secu- 
rity arrangements. 

This may not completely satisfy Professor Diaco- 
nis’ objections, but it does indicate a serious effort 
on the part of the researchers to involve such peo- 
ple. Further, the original publication of the re- 
search in Section 5 followed the reporting criteria 
established by Hyman and HonOrton (1986), thus 
providing much more detail for the reader than the 
earlier published records to which Professor 
/ Diaconis alludes. 

Points Raised by Greenhouse 

Greenhouse enumerated four items that offer al- 
ternative explanations for the observed anomalous 
efTects. Three of these {items 2-4) will be addressed 
in this section by elaborating on the details pro- 
vided in my paper. His item 1 will be addressed in 
a later section. 

Item 2 on his list questioned the role of experi- 
menter expectancy effects as a potential confounder 
in parapsychological research. While the expecta- 
tions of the experimenter may influence the report- 
ing of results, the ganzfeld experiments (as well as 
other psi experiments) are conducted in such a way - 
that experimenter expectancy cannot account for 
the results themselves. Rosenthal, who Greenhouse-' 
cites as the expert in this area, addressed this in 
his background paper for the National Research 
Council (Harris and Rosenthal, 1988a) and con- 
cluded that the ganzfeld studies were adequately 
controlled in this regard. He also visited the auto- 
ganzfeld laboratory and was given a demonstration 
> of that procedure. 

Greenhouse’s item 3, the question of what consti- 
tutes a direct hit, was addressed in my paper but 
perhaps needs elaboration. Although free-response 
experiments do generate substantial amounts of 
subjective data, the statistical analysis requires 
that the results for each trial be condensed into a 
single measure of whether or not a direct hit was 
achieved. This is done by presenting four choices to 
a judge (who of course does not know the correct 
answer) and asking the judge to decide which of the 
four best matches the subject’s response. If the 
y judge picks the target, a direct hit has occurred. 

It is true that different judges may differ on their 
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cal question is the same. Under the null hypothe- 
sis, since the target is randomly selected from the 
four possibilities presented, the probability of a 
direct hit is 0.25 regardless of who does the judg- 
ing. Thus, the observed anomalous effects cannot 
be explained by assuming there was an over- 
optimistic judge. 

If Professor Greenhouse is suggesting that the 
source of judging may be a moderating variable 
that determines the magnitude of the demonstrated 
anomalous effect, I agree. The parapsychologists 
have considered this issue in the context of whether 
or not subjects should serve as judges for their own 
sessions, with differing opinions in different labora- 
tories. This is an example of an area that has been 
suggested for further research. 

Finally, Greenhouse raised the question of the 
accuracy of the file-drawer estimates used in the 
reported meta-analyses. I agree that it is instruc- 
tive to examine the file-drawer estimate using more 
than one model. As an example, consider the 39 
studies from the direct hit and autoganzfeld data 
bases. Rosenthal’s fail-safe N estimates that there 
would have to be 371 studies in the file-drawer to 
account for the results. In contrast, the method 
proposed by Iyengar and Greenhouse gives a file- 
drawer estimate of 258 studies. Even this estimate 
is unrealistically large for a discipline with as few 
researchers as parapsychology. Given that the av- 
erage number of trials per experiment is 30, this 
would represent almost 8000 unreported trials, and 
at least that many hours of work. 

There are pros and cons to any method of esti- 
mating the number of unreported studies, and the 
actual practices of the discipline in question should 
be taken into account. Recognizing publication bias 
as an issue, the Parapsychological Association has 
had an official policy since 1975 against the selec- 
tive reporting of positive results. Of the original 
ganzfeld studies reported in Section 4 of my paper, 
less than half were significant, and it is a matter of 
record that there are many nonsignificant studies 
and “failed replications” published in all areas of 
psi research. Further, the autoganzfeld database 
reported in Section 5 has no file-drawer. Given the 
publication practices and the size of the field, the 
proposed file-drawer cannot account for the ob- 
served effects. 

Points Raised by Hyman 

One of my goals in writing this paper was to 
present a fair account of recent work and debate in 
parapsychology. Thus, I was disturbed that Hy- 
man, who has devoted much of his career to e 
stiidv of naraDsvcholoey, and who had first-hand 
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and with the outcome canonical variable but three 
correlated negatively ” (page 2, italics added). 
Rosenthal (personal communication, July 23, 1991) 
verified that this was indeed the point he was 
trying to make. Readers who are interested in 
drawing their own conclusions from first-hand 
analyses can find Hyman’s original flaw codings in 
an Appendix to his paper (Hyman, 1985, pages 
44-49). 

Finally, in my paper, I stated that the parapsy- 
chology chapter of the National Research Council 
report critically evaluated statistically significant 
experiments, but not those that were nonsignifi- 
cant. Professor Hyman “does not know how [II got 
such an impression,” so I will clarify by outlining 
some of the material reviewed in that report. There 
were surveys of three major areas of psi research: 
remote viewing (a particular type of free-response 
experiment), experiments with random number 
generators, and the ganzfeld experiments. As an 
example of where I got the impression that they 
evaluated only significant studies, consider the sec- 
tion on remote viewing. It began by referencing a 
published list of 28 studies. Fifteen of these were 
immediately discounted, since “only 13 . . . were 
published under refereed auspices” (Druckman and 
Swets, 1988, page 179). Four more were then dis- 
missed, since “Of the 13 scientifically reported 
experiments, 9 are classified as successful” (page 
179). The report continued by discussing these nine 
experiments, never again mentioning any of the 
remaining 19 studies. The other sections of the 
report placed similar emphasis on significant stud- 
ies. I did not think this was a valid statistical 
method for surveying a large body of research. 

Minor Point Raised by Morris 

The final clarification I would like to offer con- 
cerns the minor point raised by Professor Morris, 
that “When Honorton omitted studies that did not 
report direct hits as a measure, he may have biased 
his sample." This possibility was explicitly ad- 
dressed by Honorton (1985, page 59). He examined 
what would happen if z-scores of zero were inserted 
for the 10 studies for which the number of direct 
hits was not measured, but could have been. He 
found that even with this conservative scenario, 
the combined z-score only dropped from 6.60 to 
5.67. 

SATISFYING THE SKEPTICS 

Parapsychology is probably the only scientific 
discipline for which there is an organization of 
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Paranormal (CSICOP) was established in 1976 by 
philosopher Paul Kurtz and sociologist Marcello 
Truzzi when “Kurtz became convinced that the 
time was ripe for a more active crusade against 
parapsychology and other pseudo-scientists” (Pinch 
and Collins, 1984, page 627). Truzzi resigned from 
the organization the next year (as did Professor 
Diaconis) “because of what he saw as the growing 
danger of the committee’s excessive negative zeal 
at the expense of responsible scholarship” (Collins 
and Pinch, 1982, page 84). In an advertising 
brochure for their publication The Skeptical In- 
quirer, CSICOP made clear its belief that paranor- 
mal phenomena are worthy of scientific attention 
only to the extent that scientists can fight the 
growing interest in them. Part of the text of the 
brochure read: “Why the sudden explosion of inter- 
est, even among some otherwise sensible people, in 
all sorts of paranormal ‘happenings’? . . . Ten years 
ago, scientists started to fight back. They set up an 
organization— The Committee for the Scientific In- 
vestigation of Claims of the Paranormal.” 

During the six years that I have been working 
with parapsychologists, they have repeatedly ex- 
pressed their frustration with the unwillingness of 
the skeptics to specify what would constitute ac- 
ceptable evidence, or even to delineate criteria for 
an acceptable experiment. The Hyman and Honor- 
ton Joint Communique was seen as the first major 
step in that direction, especially since Hyman was 
the Chair of the Parapsychology Subcommittee of 
CSICOP. 

Hyman and Honorton (1986) devoted eight pages 
to “Recommendations for Future Psi Experiments,” 
carefully outlining details for how the experiments 
should be conducted and reported. Honorton and 
his colleagues then conducted several hundred 
trials using these specific criteria and found essen- 
tially the same effect sizes as in earlier work for 
both the overall effect and effects with moderator 
variables taken into account. I would expect Profes- 
sor Hyman to be very interested in the results of 
these experiments he helped to create. While he did 
acknowledge that they “have produced intriguing 
results,” it is both surprising and disappointing 
that be spent only a scant two paragraphs at the 
end of his discussion on these results. 

Instead, Hyman seems to be proposing yet an- 
other set of requirements to be satisfied before 
parapsychology should be taken seriously. It is dif- 
ficult to sort out what those requirements should be 
from his account: “[They should] specify, in ad- 
vance, the complete sample space and the critical 
region. When they get to the point where they can 
,.nn/<if<r tViic otnno witVi cnmp honndarY conditions 
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cal studies, resulting from the observation by one 
physician that his lung cancer patients who smoked 
did not recover at the same rate as those who did 
not. There are many medications in common use 
for which there is still no medical explanation for 
their observed therapeutic effectiveness, but that 
does not prohibit their use. 

There are also examples where a coherent theory 
of a phenomenon was impossible because the re- 
quisite background information was missing. For 
instance, the current theory of endorphins as an 
explanation for the success of acupuncture would 
have been impossible before the discovery of endor- 
phins in the 1970s. 

Mosteller’s observation that ESP will not replace 
the telephone leads to the question of whether or 
not psi abilities are of any use even if they do exist, 
since the effects are relatively small. Again, a look 
at history is instructive. For example, in 1938 For- 
tune Magazine reported that “At present, few sci- 
entists foresee any serious or practical use for 
atomic energy.” 

Greenhouse implied that I think parapsychology 
is not accepted by more of the scientific community 
only because they have not examined the data, but 
this misses the main point I was trying to make. 
The point is that individual scientists are willing to 
express an opinion without any reference to data. 
The interesting sociological question is why they 
are so resistant to examining the data. One of the 
major reasons is undoubtedly the perception identi- 
fied by Greenhouse that there is some connection 
between parapsychology and the occult, or worse, 
religious beliefs. Since religion is clearly not in the 
realm of science, the very’ thought that parapsy- 
chology might be a science leads to what psychol- 
ogists call “cognitive dissonance.” As noted by 
Griffin (1988), “People feel unpleasantly aroused 
when two cognitions are dissonant— when they con- 
tradict one another” (page 33). Griffin continued by 
observing that there are also external reasons for 
scientists to discount the evidence, since “It is gen- 
erally easier to be a skeptic in the face of novel 
evidence; skeptics may be overly conservative, but 
they are rarely held up to ridicule” (page 34). 

In summary, while it may be safer and more 
consonant with their beliefs for individual scien- 
tists to ignore the observed anomalous effects, the 
scientific community should be concerned with 
finding an explanation. The explanations proposed 
by Greenhouse and others are simply not tenable. 

REPLICATION AND MODELING 

Po^<,neir/-v>r.lncrv is ntip of the few areas where a 
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specify what should happen if there is no such 
thing as ESP by using simple binomial models, 
either to find p-values or Bayes factors. As noted 
by Mosteller, if there is no ESP, or other nonstatis- 
tical explanation for an effect, we should be able to 
carry out null experiments and get no effect. Other- 
wise, we should be worried about using these sim- 
ple models for other applications. 

Greenhouse, in his first alternative explanation 
for the results, questioned the use of these simple 
models, but his criticisms do not seem relevant to 
the experiments discussed in Section 5 of my paper. 
The experiments to which he referred were either 
poorly controlled, in which case no statistical anal- 
ysis could be valid, or were specifically designed to 
incorporate trial by trial feedback in such a way 
that the analysis needed to account for the added 
information. Models and analyses for such experi- 
ments can be found in the references given at the 
end of Diaconis' discussion. 

For the remainder of this discussion, I will con- 
fine myself to models appropriate for experiments 
such as the autoganzfeld described in Section 5. It 
is this scenario for which Bayarri and Berger com- 
puted Bayes factors, and for which Dawson dis- 
cussed possible Bayesian models. 

If ESP does exist, it is undoubtedly a gross over- 
simplification to use a simple non-null binomial 
model for these experiments. In addition to poten- 
tial differences in ability among subjects, there 
were also observed differences due to dynamic ver- 
sus static targets, whether or not the sender was a 
friend, and how the receiver scored on measures of 
extraversion. All of these differences were antici- 
pated in advance and could be incorporated into 
models as covariates. 

It is nonetheless instructive to examine the Bayes 
factor computed by Bayarri and Berger for the 
simple non-null binomial model. First, the observed 
anomalous effects would be less interesting if the 
Bayes factor was small for reasonable values of r, 
as it was for the random number generator experi- 
ments analyzed by Jefferys (1990), most of which 
purported to measure psychokinesis instead of ESP. 
Second, the Bayes factor provides a rough measure 
of the strength of the evidence against the null 
hypothesis and is a much more sensible summary 
than the p-value. The Bayes factors provided by 
Bayarri and Berger are probably more conserva- 
tive, in the sense of favoring the null hypothesis, 
than those that would result from priors elicited 
from parapsychologists, but are probably reason- 
able for those who know nothing about past ob- 
served effects. I expect tht most parapsychologists 
would not opt for a prior symmetric around chance, 
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THE ENHANCED HUMAN PERFORMANCE PROJECT: 
AN ASSESSMENT OF THE EFFORT TO DATE 



PROJECT REVIEW GROUP 
14 APRIL, 1987 

At the request of MG Philip K. Russell, MC, Commander, United States Army Medical 
Research and Development Command, the following individuals met at the Pentagon on 6 
March 1987 to assess the work of the Enhanced Human Performance Project: 

Ms. Amoretta Hoeber, TRW 
Dr. Jack Vorona, DIA 

Dr. Michael A. Wartell, Humboldt State University 
Dr. Nick Yarn, Consultant (Chairman) 

Dr. Chris Zarafonetis, Biomedical R&D, Inc. 

Others in attendance at this meeting included: 

BG Richard T. Travis, MC, Deputy Commander, USAMRDC 
Col. Philip Sobocinski, MSC, Special Assistant for Biotechnology 
Col. Peter J. McNelis, MSC, Project Manager/COR 
Mrs. Jean Smith, Principal Assistant Responsible for Contracting 
Dr. Edwin C. May, SRI, Principal Invesdgator 

In preparation for this meeting, copies of all Project reports for Fiscal Year 1986 along 
with the Scientific Oversight Committee’s comments regarding these reports and the contrac- 
tor’s responses to the comments were forwarded to each of the above-mentioned individuals 
for their review. 

The Project Review Group was asked, via correspondence (MG Russell, 12 January 
1987; Col. McNelis, 12 February 1987) and by BG Travis in his welcoming remarks at the 
meeting, to address the following questions concerning the Project: 

1. Is the science underlying this research effort essentially sound? 

2. Does the evidence to date support the existence of an anomaly? 

3. What is the potential value of this effort to the DOD? 
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4. Is the research focus and level of effort appropriate? 

The agenda for the meeting is attached as Enclosure 1. Following a presentation of the 
Project’s historical antecedents, the questions listed above provided the structure for a discus- 
sion of: FY 1986 research tasks and results, the overall plan underlying the FY 1986, effort 
and possible modifications of the plan for follow-on work. 

The Review Group's responses to the preceding questions and their recommendations for 
the Project will be presented in turn. It should be noted that there was unanimity among the 
members of the Review Group with regard to these responses. 

1. Is the science sound? 

The individual experiments conducted during Fiscal Year 1986 appear to be 
scientifically sound. The primary contractor’s response to comments of the 
Scientific Oversight Committee (SOC) leads this Review Group to conclude 
that the scientific quality of the effort is under continual qualified scrutiny, 
and immediate adjustments are made by the researchers to insure that that 
quality continues. Additionally, appropriate community-wide symposia such 
as the Theory and Proof of Principle conferences projected for FY 1987 will 
enhance that quality. 

2. Is there an anomaly? 

The results of experiments conducted by this Project during FY 1986, as well 
as other reports of previous operational related research, lead this Review 
Group to conclude that a natural anomaly exists, which we will refer to as 
Remote Viewing. 

3. Is it worthwhile? 

The Review Group believes that progress is being made in understanding this 
anomaly and that continuation of the effort is not only warranted, but entirely 
appropriate and strongly recommended. 

Should Remote Viewing be predictably reproducible and its mechanisms, 
parameters and physiological correlates understood, there would be a number 
of significant applications for the DoD. Current user agencies have reported 
utilizing the present technology with positive results. 
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Is the direction and emphasis appropriate? 

The Review Group believes that the probability of success in demonstrating 
and explaining a phenomenon known as Remote Action is less than the 
probability of success for the Remote Viewing phenomenon. Rather than 
continuing to explore both phenomena at equal levels of effort, it is 
recommended that the results of this year’s (FY87) effort be critically 
reviewed and those areas that demonstrate the most promise be exploited and 
those that do not be terminated. The focus then would be less diffuse and 
more vertical as the more productive pathways are emphasized. 

This should not be considered an economy measure, however, since the 
vertical effort should be assured of adequate resources to accomplish its more 
definitive tasks. 

The Review Group also recommends that the Project should clarify its use of 
the terms: global/conceptual replication (i.e., other labs evidence the 
phenomena without following the same protocol), exact/technical replication 
(i.e., phenomena evidenced in other labs following the same protocol with 
other subjects and other targets), and reproducibility (i.e., phenomena 
evidenced by the same subjects over time utilizing the same randomly ordered 
target set). With this in mind, it is recommended that an effort be made to 
enhance the reproducibility of the phenomena by identifying and utilizing 
especially talented individuals. It is believed that this pool of talented 
subjects would also aid in isolating neurophysiological correlates and 
mechanisms. 


It is also recommended that one or two other secure labs be identified to 
carry out exact/technical replication of the most promising experiments 
conducted by the primary contractor. 

Overall, the current breadth of experiments selected to demonstrate and 
explicate the phenomena is appropriate, as is. the present level of effort 
assigned to each of these experiments. 
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In summary, the Project Review Group has determined to its satisfaction that the work 
of the Enhanced Human Performance Project is scientifically sound, appropriately managed 
and monitored, and is providing valuable insight into the nature of an anomaly which could 
have a significant impact on the DoD. 


rQ y<UM_ 

Dr. Nick Yarn, Chairman 
Project Review Group 
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APPENDIX l 

IN-HOUSE STAFFING REQUIREMENTS 


(S/NF/SG/LIMDIS) An analysis of the PAG-TA functions 
necessary to support the achievement of the long-range goals 
indicate four major functional areas which must be supported. 
Within each functional area, personnel requirements can be 
identified. A complicating factor, however, is the fact that 
some of the functional areas (such as remote viewing (RV) , 
Intelligence Analysis, and ADP support) are highly specialized 
and require full-time dedicated personnel. 

1. (S/NF/SG/LIMDIS) RV Activities : RV activities can be 

grouped into the following major areas: 

a. Participate in R & D activities with the 
external R&D contractor 

b. Viewer Training (both in-house and with 
the external R&D contractor) 

c. Operational Activities 

(S/NF/SG/LIMDIS) It is difficult to project personnel 
requirements for this functional area, primarily because the 
projected level of operational activity is currently unknown. 
Based on the past level of operational tasking, it is anticipated 
that up to six personnel could be required. Five of the people 
would be involved in operational activities as well as 
participating in support of the R&D activities to be conducted by 
the external Contractor. One additional person would be 
designated to participate in operational and research support 
activities on a part-time basis but would devote most of his time 
to developing a training program and conducting training of new 
personnel and identification/selection of potential viewers. Due 
to the specialized nature of RV, this person needs to be a 
qualified viewer and not merely an administrative person. It 
should also be kept in mind that it takes approximately one year 
to train a viewer to operational status. 

2. (U) Foreign Intelligence Assessment : Support of this 

functional area may be grouped into the following activities: 

a. Data source identification/collection 

b. Construction of Foreign Activities 
Data Base 
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c. Analysis 

d. Production of finished intelligence 
assessments 

(U) To adequately meet the requirements of this 
functional area, two full-time personnel will be required: a 

Senior Intelligence Officer (SIO) and an Intelligence Technician 
(IT) . In order to maintain strict protocol requirements, these 
personnel should not function as operational viewers. 

(U) The IT would identify potential sources of data, 
collect the data, support the construction of the Intelligence 
database and input the required data, and assist in the 
preparation of intelligence assessments. The SIO should be an 
all-source Scientific and Technical Intelligence analyst and 
would be responsible for the identification of collection 
requirements, the analysis of intelligence data, and the 
production of finished intelligence assessments on a world-wide 
basis. 

3. (S/NF) ADP Support : Over the period of time covered 
by this Plan, the ADP support activities of PAG-TA are 
anticipated to rise dramatically, requiring one full-time person 
to function as an ADP system administrator. Several factors 
justify this position: 

a. (S/NF) PAG-TA is currently in the process 
of upgrading its ADP system to include the acquisition of a Unix- 
based SUN workstation which will not only serve as the main 
system element, but will also be used to construct the 
Intelligence and the R&D databases, serve as the communications 
link to the external Contractor, and support the operation of 
special PAG-TA research equipment. Specific areas requiring 
specialized technical attention include: 

(1) Operating system (s) 

(2) Potential LAN(s) administration 

(3) Database construction/maintenance 

(4) Language compiler (s) 

( 5 ) Per ipher a 1 s 

(6) Equipment interfaces 

(7) Data communications 

(8) System modif ications/upgrades 

(9) Development of special purpose 
software to support the PAG-TA mission 
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b. (C) PAG-TA is located some distance from 
the main Agency computer support facilities. Should the PAG-TA 
system experience problems or failures, the system would be down 

until someone from the main facility could travel to the PAG-TA 
location to effect repairs, resulting in a loss of productivity 
during the wait period. Also, any system modification/upgrades 

would have to depend on the schedule of qualified personnel, 
again resulting in loss of productivity. Therefore; it is 
essential that a person will the necessary computer science 
skills be physically located at the PAG-TA facility. 

4. (S/NF/SG/LIMDIS) Branch Administration : Tasks in this 

functional area may be grouped as follows: 

a. Word Processing 

(1) Electronic Filing 

( 2 ) Management Support 

(3) Security Administration 

(4) Report Generation/Document Preparation 

(5) RV Tasking 

(6) Generation of RV Target Pools 

b. Project/ Contract Management 

c. Collection Management 

d. Ft. Meade Interface/Facilities 

5. (S/NF/SG/LIMDIS) Tasks in this area will require three 
to four personnel — a Branch Chief, a person functioning as an 
Assistant Branch Chief (probably the SIO) , a Secretary and, 
possibly, a Collection Manager (unless this can be done on an "as 
required” basis by other Branch personnel) . The Branch Chief and 
SIO should have experience in project /contract management, 
primarily to deal with external research/ support contracts, as 
well as the ability to interface with the academic community and 
professional organizations engaged in parapsychological 
activities in addition to overall management skills associated 
with managing a Branch-size organization. 

(C) Based on this evaluation, a total of 11-12 
personnel could be required to effectively achieve PAG-TA goals. 
No attempt has been made to identify the personnel as either 
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military or civilian. This represents an increase of 1-2 
personnel over the current authorization. However; it may be 
more desirable to keep the manning level at current strength (10 
authorized/7 assigned) and adjust the existing skill mix at PAG- 
TA to more effectively meet anticipated programmatic demands 
through personnel transfers /reassignments. 






j 




SECRET 

HOT RELEASABLE TO FOREIGN NATIONALS 
STAR GATE 
LIMDIS 

1-4 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


